Publications

Fast Distributed Inference Serving for Large Language Models
Bingyang Wu*, Yinmin Zhong*, Zili Zhang*, Gang Huang, Xuanzhe Liu, Xin Jin
(* Equal contribution)
In Preprint.
[PDF] [Slides]

dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
Bingyang Wu, Ruidong Zhu, Zili Zhang, Peng Sun, Xuanzhe Liu, Xin Jin
USENIX Symposium on Operating Systems Design and Implementation (OSDI 2024), Santa Clara, July 10–12, 2024 (To appear).
[PDF] [Slides]

Jolteon: Unleashing the Promise of Serverless for Serverless Workflows
Zili Zhang, Chao Jin, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2024), Santa Clara, April 16–18, 2024 (To appear).
[PDF] [Slides]

Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining
Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2024), Santa Clara, April 16–18, 2024 (To appear).
[PDF] [Slides]

Ditto: Efficient Serverless Analytics with Elastic Parallelism
Chao Jin, Zili Zhang, Xingyu Xiang, Songyun Zou, Gang Huang, Xuanzhe Liu, Xin Jin
ACM Special Interest Group on Data Communication (SIGCOMM 2023), New York City, September 10-14, 2023.
[PDF] [Slides]

Fast, Approximate Vector Queries on Very Large Unstructured Datasets
Zili Zhang, Chao Jin, Linpeng Tang, Xuanzhe Liu, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2023), Boston, April 17–19, 2023.
[PDF] [Slides]

Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2023), Boston, April 17–19, 2023.
[PDF] [Slides]

Rise of Distributed Deep Learning Training in the Big Model Era: From A Software Engineering Perspective
Xuanzhe Liu , Diandian Gu, Zhenpeng Chen, Jinfeng Wen, Zili Zhang, Yun Ma, Haoyu Wang, Xin Jin
ACM Transactions on Software Engineering and Methodology (TOSEM 2023), 2023.
[PDF] [Slides]

Optimizing Half Precision Winograd Convolution on ARM Many-Core Processors
Dedong Xie, Zhen Jia, Zili Zhang, Xin Jin
ACM Asia-Pacific Workshop on Systems (APSys 2022), online, August 23-24, 2022.
[PDF] [Slides]