Mixture of Experts
04-02, 16:10–17:50 (CET), Rotterdam hall 1A


Session Chair: Chaojie Zhang (Microsoft)

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Shiyi Cao (UC Berkeley), Shu Liu (UC Berkeley), Tyler Griggs (UC Berkeley), Peter Schafhalter (UC Berkeley), Xiaoxuan Liu (UC Berkeley), Ying Sheng (Stanford University), Joseph E. Gonzalez (UC Berkeley), Matei Zaharia (UC Berkeley), Ion Stoica (UC Berkeley)
Paper

FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
Xinglin Pan (The Hong Kong University of Science and Technology (Guangzhou)), Wenxiang Lin (Harbin Institute of Technology, Shenzhen), Lin Zhang (Hong Kong University of Science and Technology), Shaohuai Shi (Harbin Institute of Technology, Shenzhen), Zhenheng Tang (The Hong Kong University of Science and Technology), Rui Wang (The Hong Kong University of Science and Technology (Guangzhou)), Bo Li (Hong Kong University of Science and Technology), Xiaowen Chu (The Hong Kong University of Science and Technology (Guangzhou),Hong Kong University of Science and Technology)
Paper

CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
Jiashun Suo (State Key Laboratory of CCSE and School of Computer Science and Engineering, Beihang University), Xiaojian Liao (State Key Laboratory of CCSE and School of Computer Science and Engineering, Beihang University), Limin Xiao (State Key Laboratory of CCSE and School of Computer Science and Engineering, Beihang University), Li Ruan (State Key Laboratory of CCSE and School of Computer Science and Engineering, Beihang University), Jinquan Wang (State Key Laboratory of CCSE and School of Computer Science and Engineering, Beihang University), Xiao Su (State Key Laboratory of CCSE and School of Computer Science and Engineering, Beihang University), Zhisheng Huo (State Key Laboratory of CCSE and School of Computer Science and Engineering, Beihang University)
Paper

Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Zhiyuan Fang (Sun Yat-sen University), Yuegui Huang (Sun Yat-sen University), Zicong Hong (Hong Kong University of Science and Technology), Yufeng Lyu (Huawei Technologies Co. Ltd), Wuhui Chen (Sun Yat-sen University,Peng Cheng Laboratory), Yue Yu (Peng Cheng Laboratory), Fan Yu (Huawei Technologies Co. Ltd), Zibin Zheng (Sun Yat-sen University)
Paper

MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training
Weilin Cai (The Hong Kong University of Science and Technology (Guangzhou)), Le Qin (The Hong Kong University of Science and Technology (Guangzhou)), Jiayi Huang (The Hong Kong University of Science and Technology (Guangzhou))
Paper