Profiling FastVideo

Profiling FastVideo#

!!! warning Profiling is only intended for FastVideo developers and maintainers to understand the proportion of time spent in different parts of the codebase. FastVideo end-users should never turn on profiling as it will significantly slow down the inference.

Profiling with PyTorch#

FastVideo exposes a process-wide torch profiler that you can enable via environment variables. Set FASTVIDEO_TORCH_PROFILER_DIR to an absolute directory path to start collecting traces, and specify the regions you want recorded with FASTVIDEO_TORCH_PROFILE_REGIONS:

FASTVIDEO_TORCH_PROFILER_DIR=/mnt/traces/fastvideo \
FASTVIDEO_TORCH_PROFILE_REGIONS="profiler_region_model_loading,profiler_region_training_step"

All profiled regions must be registered in fastvideo.profiler; the current list includes:

  • profiler_region_model_loading — pipeline/module loading

  • profiler_region_inference_pre_denoising

  • profiler_region_inference_denoising

  • profiler_region_inference_post_denoising

  • profiler_region_training_checkpoint_saving

  • profiler_region_training_dit

  • profiler_region_training_validation

  • profiler_region_training_epoch

  • profiler_region_training_step

  • profiler_region_training_backward

  • profiler_region_training_optimizer

  • profiler_region_distillation_teacher_forward

  • profiler_region_distillation_student_forward

  • profiler_region_distillation_loss

  • profiler_region_distillation_update

While profiling is enabled, FastVideo records additional annotations:

  • fastvideo.region::<name> spans are emitted when entering a region.

  • fastvideo.profiler.enable_collection / fastvideo.profiler.disable_collection events mark when torch profiler collection is toggled on or off.

Only one profiler instance is created per process; subsequent pipelines reuse the same controller. If you set FASTVIDEO_TORCH_PROFILE_REGIONS incorrectly (e.g. misspelled name), FastVideo logs a warning and ignores that entry.

Additional knobs:

  • FASTVIDEO_TORCH_PROFILER_RECORD_SHAPES

  • FASTVIDEO_TORCH_PROFILER_WITH_PROFILE_MEMORY

  • FASTVIDEO_TORCH_PROFILER_WITH_STACK

  • FASTVIDEO_TORCH_PROFILER_WITH_FLOPS

Traces can be visualized using https://ui.perfetto.dev/.

Best Practices#

  • Keep the profiled step count small; traces can be large and slow down job shutdown while the profiler flushes data.

  • After profiling, clean up trace directories to avoid filling disks.

  • When adding new regions, register them in fastvideo.profiler and wrap the corresponding code block with with self.profiler_controller.region("your_region"): or the @profile_region decorator.