2024
Efficient LLM Scheduling by Learning to Rank
Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang
NeurIPS 2024
[code]
MPC-Minimized Secure LLM Inference
Deevashwer Rathee*, Dacheng Li*, Ion Stoica, Hao Zhang, Raluca Ada Popa
Preprint 2024
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Xiaoxuan Liu, Cade Daniel, Lanxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
Preprint 2024
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
Zihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng
EMNLP 2024
Toward Inference-optimal Mixture-of-Expert Large Language Models
Longfei Yun*, Yonghao Zhuang*, Yao Fu, Eric P Xing, Hao Zhang
Preprint 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma*, Xiaomeng Yang*, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou*
NeurIPS 2024
[code]
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, Hao Zhang
ICML 2024
[code]
[project]
CLLMs: Consistency Large Language Models
Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang
ICML 2024
[code]
[project]
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang
OSDI 2024
[code]
[project]
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang*, Lianmin Zheng*, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E. Gonzalez, Ion Stoica
ICML 2024
[code]
[project]
InferCept: Efficient Intercept Support for Augmented Large Language Model Inference
Reyna Abhyankar*, Zijian He*, Vikranth Srivatsa, Hao Zhang, Yiying Zhang
ICML 2024
[code]
[project]
Break the Sequential Dependency of LLM Inference using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang
ICML 2024
[code]
[project]
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
ICML 2024
DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li*, Rulin Shao*, Anze Xie, Eric P Xing, Joseph E Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang
COLM 2024
[code]
Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model
Yibo Miao*, Hongcheng Gao*, Hao Zhang, Zhijie Deng
ACL 2024
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng*, Wei-Lin Chiang*, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric Xing, Joseph E Gonzalez, Ion Stoica, Hao Zhang
ICLR 2024
[project]
2023
How Long Can Context Length of Open-Source LLMs truly Promise?
Dacheng Li*, Rulin Shao*, Anze Xie, Ying Sheng, Lianmin Zheng, Joseph Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang
Instruction Tuning and Instruction Following Workshop @ NeurIPS 2023
[project]
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Lianmin Zheng*, Wei-Lin Chiang*, Ying Sheng*, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, Hao Zhang, Joseph E Gonzalez, Ion Stoica
NeurIPS 2023
[code]
[project]
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks
Hongcheng Gao, Hao Zhang, Yinpeng Dong, Zhijie Deng
Preprint 2023
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon*, Zhuohan Li*, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Yu, Joey Gonzalez, Hao Zhang, Ion Stoica
SOSP 2023
[code]
[project]
Vicuna: An Open-source Chatbot Impressing GPT-4 with 90%* Chatgpt Quality
Wei-Lin Chiang†, Zhuohan Li†, Zi Lin†, Ying Sheng†, Zhanghao Wu†, Hao Zhang†, Lianmin Zheng†, Siyuan Zhuang†, Yonghao Zhuang†, Joseph E Gonzalez†, Ion Stoica†, Eric P Xing†
Blogpost 2023
[code]
[project]
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Zhuohan Li*, Lianmin Zheng*, Yinmin Zhong*, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E Gonzalez, Ion Stoica
OSDI 2023
[code]
[project]
On Optimizing the Communication of Model Parallelism
Yonghao Zhuang*, Hexu Zhao*, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
MLSYS 2023
[code]
MPCFormer: Fast, Performant and Private Transformer Inference with MPC
Dacheng Li*, Rulin Shao*, Hongyi Wang*, Han Guo, Eric P. Xing, Hao Zhang
ICLR 2023
(Notable-top-25%)
[code]
2022
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang
NeurIPS 2022
[code]
Neural Eigenfunctions Are Structured Representation Learners
Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu
Preprint 2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng*, Zhuohan Li*, Hao Zhang*, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica
OSDI 2022
[project]
2021
Ada-segment: Automated Multi-loss Adaptation for Panoptic Segmentation
Gengwei Zhang, Yiming Gao, Hang Xu, Hao Zhang, Zhenguo Li, Xiaodan Liang
AAAI 2021
Terapipe: Token-level Pipeline Parallelism for Training Large-scale Language Models
Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica
ICML 2021
[code]
Pollux: Co-adaptive Cluster Scheduling for Goodput-optimized Deep Learning
Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R Ganger, Eric P Xing
OSDI 2021
(Jay Lepreau Best Paper Award)
[code]
2020
Machine Learning Parallelism Could Be Adaptive, Composable, and Automated
Hao Zhang
CMU PhD Dissertation 2020
Autosync: Learning to Synchronize for Data-parallel Distributed Deep Learning
Hao Zhang*, Yuan Li*, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xing
NeurIPS 2020
[code]
2019
Toward Understanding the Impact of Staleness in Distributed Machine Learning
Wei Dai, Yi Zhou, Nanqing Dong, Hao Zhang, Eric P Xing
ICLR 2019
Autoloss: Learning Discrete Schedules for Alternate Optimization
Haowen Xu*, Hao Zhang*, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing
ICLR 2019
[code]
Scan: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-rays
Wei Dai, Nanqing Dong, Zeya Wang, Xiaodan Liang, Hao Zhang, Eric P Xing
Workshop on Deep Learning in Medical Image Analysis 2019
2018
Symbolic Graph Reasoning Meets Convolutions
Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric P Xing
NeurIPS 2018
Generative Semantic Manipulation with Mask-contrasting GAN
Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing
ECCV 2018
Cavs: An Efficient Runtime System for Dynamic Neural Networks
Shizhen Xu*, Hao Zhang*, Graham Neubig, Wei Dai, Jin Kyu Kim, Zhijie Deng, Qirong Ho, Guangwen Yang, Eric P Xing
ATC 2018
[code]
2017
Zm-net: Real-time Zero-shot Image Manipulation Network
Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, Eric P Xing
Preprint 2017
Structured Generative Adversarial Networks
Zhijie Deng*, Hao Zhang*, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, Eric P Xing
NeurIPS 2017
(Nvidia Pioneer Research Award)
[code]
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P Xing
ATC 2017
[code]
[project]
Recurrent Topic-transition GAN for Visual Paragraph Generation
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P Xing
ICCV 2017
2016
On the Reducibility of Submodular Functions
Jincheng Mei, Hao Zhang, Bao-Liang Lu
AISTATS 2016
Geeps: Scalable Deep Learning on Distributed GPUs with a GPU-specialized Parameter Server
Henggang Cui, Hao Zhang, Gregory R Ganger, Phillip B Gibbons, Eric P Xing
EUROSYS 2016
[code]
Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation
Zhicheng Yan, Hao Zhang, Yangqing Jia, Thomas Breuel, Yizhou Yu
Preprint 2016
Learning Concept Taxonomies from Multi-modal Data
Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan, Eric P. Xing
ACL 2016
2015
Automatic Photo Adjustment Using Deep Neural Networks
Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, Yizhou Yu
ACM Transactions on Graphics 2015
[code]
Dynamic Topic Modeling for Monitoring Market Competition from Online Text and Image Data
Hao Zhang, Gunhee Kim, Eric P Xing
KDD 2015
[project]
A Boosting-Based Spatial-Spectral Model for Stroke Patients' EEG Analysis in Rehabilitation Training
Ye Liu, Hao Zhang, Min Chen, Liqing Zhang
Transactions on Neural Systems and Rehabilitation Engineering 2015
HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition
Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, Yizhou Yu
ICCV 2015
[code]
2014
A Tensor-based Scheme for Stroke Patients’ Motor Imagery EEG Analysis in BCI-FES Rehabilitation Training
Ye Liu, Mingfen Li, Hao Zhang, Hang Wang, Junhua Li, Jie Jia, Yi Wu, Liqing Zhang
Journal of neuroscience methods 2014
Common Spatial-spectral Boosting Pattern for Brain-computer Interface
Ye Liu, Hao Zhang, Qibin Zhao, Liqing Zhang
ECAI 2014
2013
Gaussian Mixture Modeling in Stroke Patients' Rehabilitation EEG Data Analysis
Hao Zhang, Ye Liu, Jianyi Liang, Jianting Cao, Liqing Zhang
EMBC 2013
Single-trial Discrimination of EEG Signals for Stroke Patients: a General Multi-way Analysis
Ye Liu, Mingfen Li, Hao Zhang, Junhua Li, Jie Jia, Yi Wu, Jianting Cao, Liqing Zhang
EMBC 2013