| 1-2 |
Basics: Deep learning, computational graph, autodiff, ML frameworks |
| 3 |
GPUs, CUDA, Collective communication |
| 4 |
graph and memory optimizations |
| 4 |
Guest lecture: ML compilers |
| 5 |
Data and model parallelism, auto-parallelization |
| 6 |
Transformers, LLMs, MoE |
| 6 |
Guest lecture: LLM pretraining and open science |
| 7 |
LLM training: flash attention, quantization |
| 8 |
LLM inference and serving: paged attention, continuous batching, speculative decoding |
| 9 |
Guest lecture: fast inference |
| 9 |
Scaling Law, test-time compute, reasoning |
| 10 |
LLM + X (X = RAG, search, multi-modality, etc.) |
| 10 |
Guest lecture: LLM, tool use, and agents |
| 10 |
Final exam reviews |
| 11 |
Final exam |