Skip to content

🧠 Finetuning

This guide covers finetuning video diffusion models with FastVideo, including full finetuning and LoRA.

Training Arguments

FastVideo training scripts use several argument groups:

Training Arguments

Argument Description
--max_train_steps Total training steps
--train_batch_size Batch size per GPU
--gradient_accumulation_steps Steps to accumulate before optimizer update
--num_latent_t Temporal latent dimension (reduce to save memory)
--num_height / --num_width Video resolution
--num_frames Number of frames per video
--output_dir Directory for checkpoints

Parallelism Arguments

Argument Description
--num_gpus Total number of GPUs
--sp_size Sequence parallel size (increase to reduce memory per GPU)
--tp_size Tensor parallel size
--hsdp_replicate_dim HSDP replication dimension
--hsdp_shard_dim HSDP sharding dimension

Optimizer Arguments

Argument Description
--learning_rate Base learning rate
--mixed_precision Precision mode (bf16 recommended)
--weight_decay Weight decay for regularization
--max_grad_norm Gradient clipping threshold

Validation Arguments

Argument Description
--log_validation Enable validation logging
--validation_dataset_file JSON file with validation prompts
--validation_steps Run validation every N steps
--validation_sampling_steps Inference steps for validation
--validation_guidance_scale CFG scale for validation

Full Finetuning

Full finetuning updates all model weights. This provides the best quality but requires more GPU memory.

# Example: Wan2.1 T2V 1.3B full finetune (4 GPUs)
bash examples/training/finetune/wan_t2v_1.3B/crush_smol/finetune_t2v.sh

Typical settings:

  • Learning rate: 1e-5 to 5e-5
  • Gradient checkpointing: --enable_gradient_checkpointing_type "full"
  • Memory scaling: Increase --sp_size or reduce --num_latent_t to fit in memory

LoRA Finetuning

LoRA (Low-Rank Adaptation) trains lightweight adapters while keeping the base model frozen. This significantly reduces memory usage and training time.

LoRA-Specific Arguments

Argument Description
--lora_training True Enable LoRA mode
--lora_rank Rank of LoRA adapters (16, 32, 64, 128)

Learning Rate for LoRA

Important: LoRA typically requires a 10–20× higher learning rate than full finetuning because only the low-rank adapters are being trained while the base model is frozen.

Training Mode Recommended Learning Rate
Full finetune 1e-5 to 5e-5
LoRA 1e-4 to 2e-4

Example LoRA Training

# Example: Wan2.1 T2V 1.3B LoRA finetune (1 GPU)
bash examples/training/finetune/wan_t2v_1.3B/crush_smol/finetune_t2v_lora.sh

Key differences from full finetune:

  • Add --lora_training True --lora_rank 32
  • Use higher learning rate (10–20× full finetune)
  • Can run on fewer GPUs (even single GPU)
  • Outputs adapter weights instead of full model

LoRA Extraction and Merging

FastVideo provides tools to extract LoRA adapters from finetuned models and merge them back.

Extract LoRA Adapter

Extract a LoRA adapter by comparing a finetuned model to its base:

python scripts/lora_extraction/extract_lora.py \
  --base Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
  --ft path/to/your/finetuned_model \
  --out adapter_r32.safetensors \
  --rank 32
Argument Description
--base Base model (HuggingFace ID or local path)
--ft Finetuned model path
--out Output adapter file (.safetensors)
--rank LoRA rank (16, 32, 64, 128)
--full-rank Extract full-rank adapter (optional)

Merge LoRA Adapter

Merge an adapter back into a base model:

python scripts/lora_extraction/merge_lora.py \
  --base Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
  --adapter adapter_r32.safetensors \
  --ft path/to/your/finetuned_model \
  --output merged_model
Argument Description
--base Base model path
--adapter LoRA adapter file
--ft Finetuned model (for config reference)
--output Output directory for merged model

Validate Merged Model

Compare the merged model against the original finetuned model:

python scripts/lora_extraction/lora_inference_comparison.py \
  --base merged_model \
  --ft path/to/your/finetuned_model \
  --adapter NONE \
  --output-dir results \
  --prompt "A cat sitting on a windowsill" \
  --compute-ssim \
  --compute-lpips

Training Examples

Ready-to-run training scripts are available for multiple models:

→ Browse all training examples

Model Type Example
Wan2.1 T2V 1.3B T2V examples/training/finetune/wan_t2v_1.3B/crush_smol/
Wan2.1 I2V 14B I2V examples/training/finetune/wan_i2v_14B_480p/crush_smol/
Wan2.1-Fun 1.3B InP I2V examples/training/finetune/Wan2.1-Fun-1.3B-InP/crush_smol/
Wan2.1 VSA T2V/I2V examples/training/finetune/Wan2.1-VSA/Wan-Syn-Data/

Each example includes:

  • download_dataset.sh — download sample data
  • preprocess_*.sh — run preprocessing
  • finetune_*.sh — full finetune launcher
  • finetune_*_lora.sh — LoRA finetune launcher
  • validation.json — validation prompts