LLM Serving
04-01, 13:30–15:10 (CET), Rotterdam hall 1B


Session chair: Luo Mai

CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao (Chinese University of Hong Kong (Shenzhen)), Hanchen Li (University of Chicago), Yuhan Liu (University of Chicago), Siddhant Ray (University of Chicago), Yihua Cheng (University of Chicago), Qizheng Zhang (Stanford University), Kuntai Du (University of Chicago), Shan Lu (Microsoft Research and University of Chicago), Junchen Jiang (University of Chicago)
Paper

DeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMs
Xiaozhe Yao (ETH Zurich), Qinghao Hu (MIT), Ana Klimovic (ETH Zurich)
Paper

Fast State Restoration in LLM Serving with HCache
Shiwei Gao (Tsinghua University), Youmin Chen (Tsinghua University), Jiwu Shu (Tsinghua University)
Paper

Stateful Large Language Model Serving with Pensieve
Lingfan Yu (New York University), Jinkun Lin (New York University), Jinyang Li (New York University)
Paper

SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Ziming Mao (UC Berkeley), Tian Xia (UC Berkeley), Zhanghao Wu (UC Berkeley), Wei-Lin Chiang (UC Berkeley), Tyler Griggs (UC Berkeley), Romil Bhardwaj (UC Berkeley), Zongheng Yang (UC Berkeley), Scott Shenker (ICSI AND UC Berkeley), Ion Stoica (UC Berkeley)
Paper