Profiling FastVideo#
!!! warning Profiling is only intended for FastVideo developers and maintainers to understand the proportion of time spent in different parts of the codebase. FastVideo end-users should never turn on profiling as it will significantly slow down the inference.
Profiling with PyTorch#
FastVideo exposes a process-wide torch profiler that you can enable via environment variables. Set FASTVIDEO_TORCH_PROFILER_DIR to an absolute directory path to start collecting traces, and specify the regions you want recorded with FASTVIDEO_TORCH_PROFILE_REGIONS:
FASTVIDEO_TORCH_PROFILER_DIR=/mnt/traces/fastvideo \
FASTVIDEO_TORCH_PROFILE_REGIONS="profiler_region_model_loading,profiler_region_training_step"
All profiled regions must be registered in fastvideo.profiler; the current list includes:
- profiler_region_model_loading— pipeline/module loading
- profiler_region_inference_pre_denoising
- profiler_region_inference_denoising
- profiler_region_inference_post_denoising
- profiler_region_training_checkpoint_saving
- profiler_region_training_dit
- profiler_region_training_validation
- profiler_region_training_epoch
- profiler_region_training_step
- profiler_region_training_backward
- profiler_region_training_optimizer
- profiler_region_distillation_teacher_forward
- profiler_region_distillation_student_forward
- profiler_region_distillation_loss
- profiler_region_distillation_update
While profiling is enabled, FastVideo records additional annotations:
- fastvideo.region::<name>spans are emitted when entering a region.
- fastvideo.profiler.enable_collection/- fastvideo.profiler.disable_collectionevents mark when torch profiler collection is toggled on or off.
Only one profiler instance is created per process; subsequent pipelines reuse the same controller. If you set FASTVIDEO_TORCH_PROFILE_REGIONS incorrectly (e.g. misspelled name), FastVideo logs a warning and ignores that entry.
Additional knobs:
- FASTVIDEO_TORCH_PROFILER_RECORD_SHAPES
- FASTVIDEO_TORCH_PROFILER_WITH_PROFILE_MEMORY
- FASTVIDEO_TORCH_PROFILER_WITH_STACK
- FASTVIDEO_TORCH_PROFILER_WITH_FLOPS
Traces can be visualized using https://ui.perfetto.dev/.
Best Practices#
- Keep the profiled step count small; traces can be large and slow down job shutdown while the profiler flushes data. 
- After profiling, clean up trace directories to avoid filling disks. 
- When adding new regions, register them in - fastvideo.profilerand wrap the corresponding code block with- with self.profiler_controller.region("your_region"):or the- @profile_regiondecorator.