LLM Serving
04-01, 15:40–17:00 (CET), Rotterdam hall 1A


Session chair: Thaleia Doudali

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
Ruibo FAN (Data Science and Analytics Thrust, HKUST(GZ)), Xiangrui YU (Data Science and Analytics Thrust, HKUST(GZ)), Peijie Dong (Data Science and Analytics Thrust, HKUST(GZ)), Zeyu Li (Data Science and Analytics Thrust, HKUST(GZ)), Gu Gong (Data Science and Analytics Thrust, HKUST(GZ)), QIANG WANG (Harbin Institute of Technology (Shenzhen)), Wei Wang (Hong Kong University of Science and Technology), Xiaowen Chu (Data Science and Analytics Thrust, HKUST(GZ))
Paper

Empower Vision Applications with LoRA LMM
Liang Mi (Nanjing University), Weijun Wang (Institute for AI Industry Research (AIR), Tsinghua University), Wenming Tu (Institute for AI Industry Research (AIR), Tsinghua University), Qingfeng He (Institute for AI Industry Research (AIR), Tsinghua University), Kui Kong (Institute for AI Industry Research (AIR), Tsinghua University), Xinyu Fang (Institute for AI Industry Research (AIR), Tsinghua University), Yazhu Dong (Institute for AI Industry Research (AIR), Tsinghua University), Yikang Zhang (Nanjing University), Yuanchun Li (Institute for AI Industry Research (AIR), Tsinghua University), Meng Li (Nanjing University), Haipeng Dai (Nanjing University), Guihai Chen (Nanjing University), Yunxin Liu (Institute for AI Industry Research (AIR), Tsinghua University), Weijun Wang (Tsinghua University)T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on EdgeJianyu Wei (USTC and Microsoft Research), Shijie Cao (Microsoft Research), Ting Cao (Microsoft Research), Lingxiao Ma (Microsoft Research), Lei Wang (UCAS and Microsoft Research), Yanyong Zhang (University of Science and Technology of China), Mao Yang (Microsoft Research)Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor CoresChenpeng Wu (Shanghai Jiao Tong University), Qiqi Gu (Shanghai Jiao Tong University), Heng Shi (Shanghai Enflame Technology Co.Ltd; Shanghai Jiao Tong University), Jianguo Yao (Shanghai Jiao Tong University), Haibing Guan (Shanghai Jiao Tong University)
Paper

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Jianyu Wei (USTC and Microsoft Research), Shijie Cao (Microsoft Research), Ting Cao (Microsoft Research), Lingxiao Ma (Microsoft Research), Lei Wang (UCAS and Microsoft Research), Yanyong Zhang (University of Science and Technology of China), Mao Yang (Microsoft Research)
Paper

Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
Chenpeng Wu (Shanghai Jiao Tong University), Qiqi Gu (Shanghai Jiao Tong University), Heng Shi (Shanghai Enflame Technology Co.Ltd; Shanghai Jiao Tong University), Jianguo Yao (Shanghai Jiao Tong University), Haibing Guan (Shanghai Jiao Tong University)
Paper