🔍 FastVideo Overview#
This document outlines FastVideo’s architecture for developers interested in framework internals or contributions. It serves as an onboarding guide for new contributors by providing an overview of the most important directories and files within the fastvideo/v1/
codebase.
Table of Contents - V1 Directory Structure and Files#
fastvideo/v1/pipelines/
- Core diffusion pipeline componentsfastvideo/v1/models/
- Model implementationsdits/
- Transformer-based diffusion modelsvaes/
- Variational autoencodersencoders/
- Text and image encodersschedulers/
- Diffusion schedulers
fastvideo/v1/attention/
- Optimized attention implementationsfastvideo/v1/distributed/
- Distributed computing utilitiesfastvideo/v1/layers/
- Custom neural network layersfastvideo/v1/platforms/
- Hardware platform abstractionsfastvideo/v1/worker/
- Multi-GPU process managementfastvideo/v1/fastvideo_args.py
- Argument handlingfastvideo/v1/forward_context.py
- Forward pass context managementfastvideo/v1/utils.py
- Utility functionsfastvideo/v1/logger.py
- Logging infrastructure
Core Architecture#
FastVideo separates model components from execution logic with these principles:
Component Isolation: Models (encoders, VAEs, transformers) are isolated from execution (pipelines, stages, distributed processing)
Modular Design: Components can be independently replaced
Distributed Execution: Supports various parallelism strategies (Tensor, Sequence)
Custom Attention Backends: Components can support and use different Attention implementations
Pipeline Abstraction: Consistent interface across diffusion models
FastVideoArgs#
The FastVideoArgs
class in fastvideo/v1/fastvideo_args.py
serves as the central configuration system for FastVideo. It contains all parameters needed to control model loading, inference configuration, performance optimization settings, and more.
Key features include:
Command-line Interface: Automatic conversion between CLI arguments and dataclass fields
Configuration Groups: Organized by functional areas (model loading, video params, optimization settings)
Context Management: Global access to current settings via
get_current_fastvideo_args()
Parameter Validation: Ensures valid combinations of settings
Common configuration areas:
Model paths and loading options:
model_path
,trust_remote_code
,revision
Distributed execution settings:
num_gpus
,tp_size
,sp_size
Video generation parameters:
height
,width
,num_frames
,num_inference_steps
Precision settings: Control computation precision for different components
Example usage:
# Load arguments from command line
fastvideo_args = prepare_fastvideo_args(sys.argv[1:])
# Access parameters
model = load_model(fastvideo_args.model_path)
# Set as global context
with set_current_fastvideo_args(fastvideo_args):
# Code that requires access to these arguments
result = generate_video()
Pipeline System#
ComposedPipelineBase
#
This foundational class provides:
Model Loading: Automatically loads components from HuggingFace-Diffusers-compatible model directories
Stage Management: Creates and orchestrates processing stages
Data Flow Coordination: Ensures proper state flow between stages
class MyCustomPipeline(ComposedPipelineBase):
_required_config_modules = [
"text_encoder", "tokenizer", "vae", "transformer", "scheduler"
]
def initialize_pipeline(self, fastvideo_args: FastVideoArgs):
# Pipeline-specific initialization
pass
def create_pipeline_stages(self, fastvideo_args: FastVideoArgs):
self.add_stage("input_validation_stage", InputValidationStage())
self.add_stage("text_encoding_stage", CLIPTextEncodingStage(
text_encoder=self.get_module("text_encoder"),
tokenizer=self.get_module("tokenizer")
))
# Additional stages...
Pipeline Stages#
Each stage handles a specific diffusion process component:
Input Validation: Parameter verification
Text Encoding: CLIP, LLaMA, or T5-based encoding
Image Encoding: Image input processing
Timestep & Latent Preparation: Setup for diffusion
Denoising: Core diffusion loop
Decoding: Latent-to-pixel conversion
Each stage implements a standard interface:
def forward(self, batch: ForwardBatch, fastvideo_args: FastVideoArgs) -> ForwardBatch:
# Process batch and update state
return batch
ForwardBatch#
Defined in fastvideo/v1/pipelines/pipeline_batch_info.py
, ForwardBatch
encapsulates the data payload passed between pipeline stages. It typically holds:
Input Data: Prompts, images, generation parameters
Intermediate State: Embeddings, latents, timesteps, accumulated during stage execution
Output Storage: Generated results and metadata
Configuration: Sampling parameters, precision settings
This structure facilitates clear state transitions between stages.
Model Components#
The fastvideo/v1/models/
directory contains implementations of the core neural network models used in video diffusion:
Transformer Models#
Transformer networks perform the actual denoising during diffusion:
Location:
fastvideo/v1/models/dits/
Examples:
WanTransformer3DModel
HunyuanVideoTransformer3DModel
Features include:
Text/image conditioning
Standardized interface for model-specific optimizations
def forward(
self,
latents, # [B, T, C, H, W]
encoder_hidden_states, # Text embeddings
timestep, # Current diffusion timestep
encoder_hidden_states_image=None, # Optional image embeddings
**kwargs
):
# Perform denoising computation
return noise_pred # Predicted noise residual
VAE (Variational Auto-Encoder)#
VAEs handle conversion between pixel space and latent space:
Location:
fastvideo/v1/models/vaes/
Examples:
AutoencoderKLWan
AutoencoderKLHunyuanVideo
These models compress image/video data to a more efficient latent representation (typically 4x-8x smaller in each dimension).
FastVideo’s VAE implementations include:
Efficient video batch processing
Memory optimization
Optional tiling for large frames
Distributed weight support
Text and Image Encoders#
Encoders process conditioning inputs into embeddings:
Location:
fastvideo/v1/models/encoders/
Text Encoders:
CLIPTextModel
LlamaModel
UMT5EncoderModel
Image Encoders:
CLIPVisionModel
FastVideo implements optimizations such as:
Vocab parallelism for distributed processing
Caching for common prompts
Precision-tuned computation
Schedulers#
Schedulers manage the diffusion sampling process:
Location:
fastvideo/v1/models/schedulers/
Examples:
UniPCMultistepScheduler
FlowMatchEulerDiscreteScheduler
These components control:
Diffusion timestep sequences
Noise prediction to latent update conversions
Quality/speed trade-offs
def step(
self,
model_output: torch.Tensor,
timestep: torch.LongTensor,
sample: torch.Tensor,
**kwargs
) -> torch.Tensor:
# Process model output and update latents
# Return updated latents
return prev_sample
Optimized Attention#
The fastvideo/v1/attention/
directory contains optimized attention implementations crucial for efficient video diffusion:
Attention Backends#
Multiple implementations with automatic selection:
FLASH_ATTN: Optimized for supporting hardware
TORCH_SDPA: Built-in PyTorch scaled dot-product attention
SLIDING_TILE_ATTN: For very long sequences
# Configure available attention backends for this layer
self.attn = LocalAttention(
num_heads=num_heads,
head_size=head_dim,
causal=False,
supported_attention_backends=(_Backend.FLASH_ATTN, _Backend.TORCH_SDPA)
)
# Override via environment variable
# export FASTVIDEO_ATTENTION_BACKEND=FLASH_ATTN
Attention Patterns#
Supports various patterns with memory optimization techniques:
Cross/Self/Temporal/Global-Local Attention
Chunking, progressive computation, optimized masking
Distributed Processing#
The fastvideo/v1/distributed/
directory contains implementations for distributed model execution:
Tensor Parallelism#
Tensor parallelism splits model weights across devices:
Implementation: Through
RowParallelLinear
andColumnParallelLinear
layersUse cases: Will be used by encoder models as their sequence lengths are shorter and enables efficient sharding.
# Tensor-parallel layers in a transformer block
from fastvideo.v1.layers.linear import ColumnParallelLinear, RowParallelLinear
# Split along output dimension
self.qkv_proj = ColumnParallelLinear(
input_size=hidden_size,
output_size=3 * hidden_size,
bias=True,
gather_output=False
)
# Split along input dimension
self.out_proj = RowParallelLinear(
input_size=hidden_size,
output_size=hidden_size,
bias=True,
input_is_parallel=True
)
Sequence Parallelism#
Sequence parallelism splits sequences across devices:
Implementation: Through
DistributedAttention
and sequence splittingUse cases: Long video sequences or high-resolution processing. Used by DiT models.
# Distributed attention for long sequences
from fastvideo.v1.attention import DistributedAttention
self.attn = DistributedAttention(
num_heads=num_heads,
head_size=head_dim,
causal=False,
supported_attention_backends=(_Backend.SLIDING_TILE_ATTN, _Backend.FLASH_ATTN)
)
Communication Primitives#
Efficient distributed operations via AllGather, AllReduce, and synchronization mechanisms.
Efficient communication primitives minimize distributed overhead:
Sequence-Parallel AllGather: Collects sequence chunks
Tensor-Parallel AllReduce: Combines partial results
Distributed Synchronization: Coordinates execution
Forward Context Management#
ForwardContext#
Defined in fastvideo/v1/forward_context.py
, ForwardContext
manages execution-specific state within a forward pass, particularly for low-level optimizations. It is accessed via get_forward_context()
.
Attention Metadata: Configuration for optimized attention kernels (
attn_metadata
)Profiling Data: Potential hooks for performance metrics collection
This context-based approach enables:
Dynamic optimization based on execution state (e.g., attention backend selection)
Step-specific customizations within model components
Usage example:
with set_forward_context(current_timestep, attn_metadata, fastvideo_args):
# During this forward pass, components can access context
# through get_forward_context()
output = model(inputs)
Executor and Worker System#
The fastvideo/v1/worker/
directory contains the distributed execution framework:
Executor Abstraction#
FastVideo implements a flexible execution model for distributed processing:
Executor Base Class: An abstract base class defining the interface for all executors
MultiProcExecutor: Primary implementation that spawns and manages worker processes
GPU Workers: Handle actual model execution on individual GPUs
The MultiProcExecutor implementation:
Spawns worker processes for each GPU
Establishes communication channels via pipes
Coordinates distributed operations across workers
Handles graceful startup and shutdown of the process group
Each GPU worker:
Initializes the distributed environment
Builds the pipeline for the specified model
Executes requested operations on its assigned GPU
Manages local resources and communicates results back to the executor
This design allows FastVideo to efficiently utilize multiple GPUs while providing a simple, unified interface for model execution.
Platforms#
The fastvideo/v1/platforms/
directory provides hardware platform abstractions that enable FastVideo to run efficiently on different hardware configurations:
Platform Abstraction#
FastVideo’s platform abstraction layer enables:
Hardware Detection: Automatic detection of available hardware
Backend Selection: Appropriate selection of compute kernels
Memory Management: Efficient utilization of hardware-specific memory features
The primary components include:
Platform Interface: Defines the common API for all platform implementations
CUDA Platform: Optimized implementation for NVIDIA GPUs
Backend Enum: Used throughout the codebase for feature selection
Usage example:
from fastvideo.v1.platforms import current_platform, _Backend
# Check hardware capabilities
if current_platform.supports_backend(_Backend.FLASH_ATTN):
# Use FlashAttention implementation
else:
# Fall back to standard implementation
The platform system is designed to be extensible for future hardware targets.
Logger#
See PR
TODO: (help wanted) Add an environment variable that disables process-aware logging.
Contributing to FastVideo#
If you’re a new contributor, here are some common areas to explore:
Adding a new model: Implement new model types in the appropriate subdirectory of
fastvideo/v1/models/
Optimizing performance: Look at attention implementations or memory management
Adding a new pipeline: Create a new pipeline subclass in
fastvideo/v1/pipelines/
Hardware support: Extend the
platforms
module for new hardware targets
When adding code, follow these practices:
Use type hints for better code readability
Add appropriate docstrings
Maintain the separation between model components and execution logic
Follow existing patterns for distributed processing