04-02, 14:00–15:40 (CET), Rotterdam hall 1A
Session Chair: Xiaosong Ma (MBZUAI)
Fast On-device LLM Inference with NPUs
Daliang Xu (Key Lab of HCST (PKU), MOE; SCS, Peking University), Hao Zhang (Beijing University of Posts and Telecommunications), Liming Yang (Key Lab of HCST (PKU), MOE; SCS, Peking University), Ruiqi Liu (Key Lab of HCST (PKU), MOE; SCS, Peking University), Gang Huang (Key Lab of HCST (PKU), MOE; SCS, Peking University,National Key Laboratory of Data Space Technology and System), Mengwei Xu (Beijing University of Posts and Telecommunications), Xuanzhe Liu (Key Lab of HCST (PKU), MOE; SCS, Peking University)
Paper
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow
Yixuan Mei (Carnegie Mellon University), Yonghao Zhuang (Carnegie Mellon University), Xupeng Miao (Carnegie Mellon University), Juncheng Yang (Carnegie Mellon University), Zhihao Jia (Carnegie Mellon University), Rashmi Vinayak (Carnegie Mellon University)
Paper
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
Yujie Wang (Peking University), Shiju Wang (Beihang University), Shenhan Zhu (Peking University), Fangcheng Fu (Peking University), Xinyi Liu (Peking University), Xuefeng Xiao (ByteDance Inc.), Huixia Li (ByteDance Inc.), Jiashi Li (ByteDance Inc.), Faming Wu (ByteDance Inc.), Bin Cui (Peking University)
Paper
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling
Yujie Wang (Peking University), Shenhan Zhu (Peking University), Fangcheng Fu (Peking University), Xupeng Miao (Purdue University), Jie Zhang (Alibaba Group), Juan Zhu (Alibaba Group), Fan Hong (Alibaba Group), Yong Li (Alibaba Group), Bin Cui (Peking University)
Paper
Vela: A Virtualized LLM Training System with GPU Direct RoCE
Apoorve Mohan (IBM Research), Robert Walkup (IBM Research), Bengi Karacali (IBM Research), Ming-Hung Chen (IBM Research), Abdullah Kayi (IBM Research), Liran Schour (IBM), Shweta Salaria (IBM Research), Sophia Wen (IBM Research), IHsin Chung (IBM Research), Abdul Alim (IBM Research), Constantinos Evangelinos (IBM Research), Lixiang Luo (IBM Research), Marc Dombrowa (IBM Research), Laurent Schares (IBM Research), Ali Sydney (IBM Research), Pavlos Maniotis (IBM Research), Sandhya Koteshwara (IBM Research), Brent Tang (IBM), Joel Belog (IBM), Rei Odaira (IBM), Vasily Tarasov (IBM Research), Eran Gampel (IBM Cloud), Drew Thorstensen (IBM), Talia Gershon (IBM Research), Seetharami Seelam (IBM Research)
Paper