Dynasor

Making Reasoning Models More Token-Efficient

GameArena

Evaluating LLM Reasoning through Live Computer Games

vLLM-LTR

Efficient LLM Scheduling by Learning to Rank

MuxServe

Serving Multiple LLMs with Flexible Spatial-Temporal Multiplexing

CLLM

Consistency Large Language Models: A Family of Efficient Parallel Decoders

DistServe

Maximizing Goodput in LLM Serving using Prefill-Decode Disaggregation