Profiling FastVideo#
!!! warning Profiling is only intended for FastVideo developers and maintainers to understand the proportion of time spent in different parts of the codebase. FastVideo end-users should never turn on profiling as it will significantly slow down the inference.
Profiling with PyTorch#
FastVideo exposes a process-wide torch profiler that you can enable via environment variables. Set FASTVIDEO_TORCH_PROFILER_DIR
to an absolute directory path to start collecting traces, and specify the regions you want recorded with FASTVIDEO_TORCH_PROFILE_REGIONS
:
FASTVIDEO_TORCH_PROFILER_DIR=/mnt/traces/fastvideo \
FASTVIDEO_TORCH_PROFILE_REGIONS="profiler_region_model_loading,profiler_region_training_step"
All profiled regions must be registered in fastvideo.profiler
; the current list includes:
profiler_region_model_loading
— pipeline/module loadingprofiler_region_inference_pre_denoising
profiler_region_inference_denoising
profiler_region_inference_post_denoising
profiler_region_training_checkpoint_saving
profiler_region_training_dit
profiler_region_training_validation
profiler_region_training_epoch
profiler_region_training_step
profiler_region_training_backward
profiler_region_training_optimizer
profiler_region_distillation_teacher_forward
profiler_region_distillation_student_forward
profiler_region_distillation_loss
profiler_region_distillation_update
While profiling is enabled, FastVideo records additional annotations:
fastvideo.region::<name>
spans are emitted when entering a region.fastvideo.profiler.enable_collection
/fastvideo.profiler.disable_collection
events mark when torch profiler collection is toggled on or off.
Only one profiler instance is created per process; subsequent pipelines reuse the same controller. If you set FASTVIDEO_TORCH_PROFILE_REGIONS
incorrectly (e.g. misspelled name), FastVideo logs a warning and ignores that entry.
Additional knobs:
FASTVIDEO_TORCH_PROFILER_RECORD_SHAPES
FASTVIDEO_TORCH_PROFILER_WITH_PROFILE_MEMORY
FASTVIDEO_TORCH_PROFILER_WITH_STACK
FASTVIDEO_TORCH_PROFILER_WITH_FLOPS
Traces can be visualized using https://ui.perfetto.dev/.
Best Practices#
Keep the profiled step count small; traces can be large and slow down job shutdown while the profiler flushes data.
After profiling, clean up trace directories to avoid filling disks.
When adding new regions, register them in
fastvideo.profiler
and wrap the corresponding code block withwith self.profiler_controller.region("your_region"):
or the@profile_region
decorator.