DSC 204A: Scalable Data Systems
Instructor: Hao Zhang, UC San Diego, Winter 2024
Announcements
Week 0 Announcements
- Welcome to the Winter 2024 offering of DSC 204A!
- We’re excited to work with you throughout the quarter!
- Check back here for more updates soon!
- We’ll be updating the pages of this site regularly in the first few weeks!
Week 1
- Jan 8
- Reading: N/A
- Survey Beginning of Quarter Survey
- Jan 10
-
- 2 Basics: Computer Organization, Operating systems, Storage
- Slides • Recording • Scribe Notes
- Reading: N/A
- Jan 12
-
- 3 Basics: Computer Organization, Operating systems, Storage
- Slides • Recording • Scribe Notes
- Reading: N/A
Week 2
- Jan 17
-
- 1 Basics: OS - 1
- Slides • Recording • Scribe Notes
- Reading:
- Jan 19
-
- 2 Basics: OS - 2
- Slides • Recording • Scribe Notes
- Reading:
Week 3
- Jan 22
- 1 Lecture Canceled
- Reading:
- Jan 24
-
- 2 Basics: OS - 3
- Slides • Recording • Scribe Notes
- Reading:
- Jan 26
-
- 3 Cloud Computing Introduction
- Slides • Recording • Scribe Notes
- Reading:
Week 4
- Jan 29
-
- 1 Network - 1
- Slides • Recording • Scribe Notes
- Reading:
- Jan 31
-
- 2 Network - 2
- Slides • Recording • Scribe Notes
- Reading:
- Feb 2
-
- 3 Collective Communication - 1
- Slides • Recording • Scribe Notes
- Reading: N/A
Week 5
- Feb 5
-
- 1 Collective Communication - 2
- Slides • Recording • Scribe Notes
- Reading:
- Feb 7
-
- 2 Cloud Storage - 1
- Slides • Recording • Scribe Notes
- Reading:
- Feb 9
-
- 3 Cloud Storage - 2
- Slides • Recording • Scribe Notes
- Reading:
Week 6
- Feb 12
-
- 1 Parallelism Basics - 1
- Slides • Recording • Scribe Notes
- Reading:
- Feb 14
-
- 2 Parallelism Basics - 2
- Slides • Recording • Scribe Notes
- Reading:
- Feb 16
-
- 3 Parallelism Data
- Slides • Recording • Scribe Notes
- Reading:
Week 7
- Feb 19
-
- 1 Holiday!
- Slides • Recording • Scribe Notes
- Reading:
- Feb 21
-
- 2 Guest Lecture - Prof. Yiying Zhang
- Slides • Recording • Scribe Notes
- Reading:
- Feb 23
-
- 3 Batch Processing - 1
- Slides • Recording • Scribe Notes
- Reading:
Week 8
- Feb 26
-
- 1 Batch processing - 2
- Slides • Recording • Scribe Notes
- Reading:
- Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters (Required)
- Ray: A Distributed Framework for Emerging AI Applications (Required)
- Spark SQL: Relational Data Processing in Spark (Required)
- Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling (Optional)
- PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs (Optional)
- Feb 28
-
- 2 Streaming processing - 1
- Slides • Recording • Scribe Notes
- Reading:
- Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters (Required)
- Ray: A Distributed Framework for Emerging AI Applications (Required)
- Spark SQL: Relational Data Processing in Spark (Required)
- Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling (Optional)
- PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs (Optional)
- Mar 1
-
- 3 Streaming processing - 2
- Slides • Recording • Scribe Notes
- Reading:
- Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters (Required)
- Ray: A Distributed Framework for Emerging AI Applications (Required)
- Spark SQL: Relational Data Processing in Spark (Required)
- Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling (Optional)
- PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs (Optional)
Week 9
- Mar 4
-
- 1 Guest Lecture - Stephanie Wang
- Slides • Recording • Scribe Notes
- Reading:
- TensorFlow: A system for large-scale machine learning (required)
- Petuum: A New Platform for Distributed Machine Learning on Big Data (required)
- Scaling Distributed Machine Learning with the Parameter Server (required)
- PipeDream: Generalized Pipeline Parallelism for DNN Training (optional)
- PyTorch Distributed: Experiences on Accelerating Data Parallel Training (optional)
- Mar 6
-
- 2 ML System - 1
- Slides • Recording • Scribe Notes
- Reading:
- TensorFlow: A system for large-scale machine learning (required)
- Petuum: A New Platform for Distributed Machine Learning on Big Data (required)
- Scaling Distributed Machine Learning with the Parameter Server (required)
- PipeDream: Generalized Pipeline Parallelism for DNN Training (optional)
- PyTorch Distributed: Experiences on Accelerating Data Parallel Training (optional)
- Mar 8
-
- 3 Guest Lecture - Prof. Ion Stoica
- Slides • Recording • Scribe Notes
- Reading:
- TensorFlow: A system for large-scale machine learning (required)
- Petuum: A New Platform for Distributed Machine Learning on Big Data (required)
- Scaling Distributed Machine Learning with the Parameter Server (required)
- PipeDream: Generalized Pipeline Parallelism for DNN Training (optional)
- PyTorch Distributed: Experiences on Accelerating Data Parallel Training (optional)
Week 10
- Mar 11
-
- 1 ML System - 2
- Slides • Recording • Scribe Notes
- Reading:
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (optional)
- GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism (optional)
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (optional)
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (optional)
- Efficient Memory Management for Large Language Model Serving with PagedAttention (optional)
- Mar 13
-
- 2 ML System - 3
- Slides • Recording • Scribe Notes
- Reading:
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (optional)
- GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism (optional)
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (optional)
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (optional)
- Efficient Memory Management for Large Language Model Serving with PagedAttention (optional)
- Mar 15
-
- 3 ML System - 4
- Slides • Recording • Scribe Notes
- Reading:
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (optional)
- GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism (optional)
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (optional)
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (optional)
- Efficient Memory Management for Large Language Model Serving with PagedAttention (optional)