04-03, 09:00–10:40 (CET), Rotterdam hall 1A
Session Chair: Thaleia Doudali (IMDEA Software)
Accelerating LLM Serving for Multi-turn Dialogues with Efficient Resource Management
Jinwoo Jeong (Korea University), Jeongseob Ahn (Korea University)
Paper
COMET: Towards Practical W4A4KV4 LLMs Serving
Lian Liu (Institute of Computing Technology, CAS,University of Chinese Academy of Sciences), Long Cheng (North China Electric Power University), Haimeng Ren (ShanghaiTech University), Zhaohui Xu (ShanghaiTech University), Yudong Pan (Institute of Computing Technology, CAS,University of Chinese Academy of Sciences), Mengdi Wang (Institute of Computing Technology, CAS), Xiaowei Li (Institute of Computing Technology, CAS,Zhongguancun Laboratory), Yinhe Han (Institute of Computing Technology, CAS), Ying Wang (Institute of Computing Technology, CAS)
Paper
Past-Future Scheduler for LLM Serving under SLA Guarantees
Ruihao Gong (Beihang University), Shihao Bai (SenseTime), Siyu Wu (Beihang University), Yunqian Fan (SenseTime), Zaijun Wang (SenseTime), Xiuhong Li (Peking University), Hailong Yang (Beihang University), Xianglong Liu (Beihang University)
Paper
POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
Aditya K Kamath (Paul G Allen School of Computer Science and Engineering, University of Washington), Ramya Prabhu (Microsoft Research India), Jayashree Mohan (Microsoft Research India), Simon Peter (Paul G Allen School of Computer Science and Engineering, University of Washington), Ramachandran Ramjee (Microsoft Research India), Ashish Panwar (Microsoft Research India)
Paper
TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms
Jovan Stojkovic (University of Illinois at Urbana-Champaign), Chaojie Zhang (Microsoft Azure Research), Íñigo Goiri (Microsoft Azure Research), Esha Choukse (Microsoft Azure Research), Haoran Qiu (Microsoft Azure Research), Rodrigo Fonseca (Microsoft Azure Research), Josep Torrellas (University of Illinois at Urbana-Champaign), Ricardo Bianchini (Microsoft Azure)
Paper