Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
Ultra-Fast Diffusion LLM 🚀
Make Video Generation Faster
Making Reasoning Models More Token-Efficient
Evaluating LLM Reasoning through Live Computer Games
Efficient LLM Scheduling by Learning to Rank
Serving Multiple LLMs with Flexible Spatial-Temporal Multiplexing
Consistency Large Language Models: A Family of Efficient Parallel Decoders
Maximizing Goodput in LLM Serving using Prefill-Decode Disaggregation