🧠Finetuning¶
This guide covers finetuning video diffusion models with FastVideo, including full finetuning and LoRA.
Training Arguments¶
FastVideo training scripts use several argument groups:
Training Arguments¶
| Argument | Description |
|---|---|
--max_train_steps |
Total training steps |
--train_batch_size |
Batch size per GPU |
--gradient_accumulation_steps |
Steps to accumulate before optimizer update |
--num_latent_t |
Temporal latent dimension (reduce to save memory) |
--num_height / --num_width |
Video resolution |
--num_frames |
Number of frames per video |
--output_dir |
Directory for checkpoints |
Parallelism Arguments¶
| Argument | Description |
|---|---|
--num_gpus |
Total number of GPUs |
--sp_size |
Sequence parallel size (increase to reduce memory per GPU) |
--tp_size |
Tensor parallel size |
--hsdp_replicate_dim |
HSDP replication dimension |
--hsdp_shard_dim |
HSDP sharding dimension |
Optimizer Arguments¶
| Argument | Description |
|---|---|
--learning_rate |
Base learning rate |
--mixed_precision |
Precision mode (bf16 recommended) |
--weight_decay |
Weight decay for regularization |
--max_grad_norm |
Gradient clipping threshold |
Validation Arguments¶
| Argument | Description |
|---|---|
--log_validation |
Enable validation logging |
--validation_dataset_file |
JSON file with validation prompts |
--validation_steps |
Run validation every N steps |
--validation_sampling_steps |
Inference steps for validation |
--validation_guidance_scale |
CFG scale for validation |
Full Finetuning¶
Full finetuning updates all model weights. This provides the best quality but requires more GPU memory.
# Example: Wan2.1 T2V 1.3B full finetune (4 GPUs)
bash examples/training/finetune/wan_t2v_1.3B/crush_smol/finetune_t2v.sh
Typical settings:
- Learning rate:
1e-5to5e-5 - Gradient checkpointing:
--enable_gradient_checkpointing_type "full" - Memory scaling: Increase
--sp_sizeor reduce--num_latent_tto fit in memory
LoRA Finetuning¶
LoRA (Low-Rank Adaptation) trains lightweight adapters while keeping the base model frozen. This significantly reduces memory usage and training time.
LoRA-Specific Arguments¶
| Argument | Description |
|---|---|
--lora_training True |
Enable LoRA mode |
--lora_rank |
Rank of LoRA adapters (16, 32, 64, 128) |
Learning Rate for LoRA¶
Important: LoRA typically requires a 10–20× higher learning rate than full finetuning because only the low-rank adapters are being trained while the base model is frozen.
| Training Mode | Recommended Learning Rate |
|---|---|
| Full finetune | 1e-5 to 5e-5 |
| LoRA | 1e-4 to 2e-4 |
Example LoRA Training¶
# Example: Wan2.1 T2V 1.3B LoRA finetune (1 GPU)
bash examples/training/finetune/wan_t2v_1.3B/crush_smol/finetune_t2v_lora.sh
Key differences from full finetune:
- Add
--lora_training True --lora_rank 32 - Use higher learning rate (10–20× full finetune)
- Can run on fewer GPUs (even single GPU)
- Outputs adapter weights instead of full model
LoRA Extraction and Merging¶
FastVideo provides tools to extract LoRA adapters from finetuned models and merge them back.
Extract LoRA Adapter¶
Extract a LoRA adapter by comparing a finetuned model to its base:
python scripts/lora_extraction/extract_lora.py \
--base Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
--ft path/to/your/finetuned_model \
--out adapter_r32.safetensors \
--rank 32
| Argument | Description |
|---|---|
--base |
Base model (HuggingFace ID or local path) |
--ft |
Finetuned model path |
--out |
Output adapter file (.safetensors) |
--rank |
LoRA rank (16, 32, 64, 128) |
--full-rank |
Extract full-rank adapter (optional) |
Merge LoRA Adapter¶
Merge an adapter back into a base model:
python scripts/lora_extraction/merge_lora.py \
--base Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
--adapter adapter_r32.safetensors \
--ft path/to/your/finetuned_model \
--output merged_model
| Argument | Description |
|---|---|
--base |
Base model path |
--adapter |
LoRA adapter file |
--ft |
Finetuned model (for config reference) |
--output |
Output directory for merged model |
Validate Merged Model¶
Compare the merged model against the original finetuned model:
python scripts/lora_extraction/lora_inference_comparison.py \
--base merged_model \
--ft path/to/your/finetuned_model \
--adapter NONE \
--output-dir results \
--prompt "A cat sitting on a windowsill" \
--compute-ssim \
--compute-lpips
Training Examples¶
Ready-to-run training scripts are available for multiple models:
→ Browse all training examples
| Model | Type | Example |
|---|---|---|
| Wan2.1 T2V 1.3B | T2V | examples/training/finetune/wan_t2v_1.3B/crush_smol/ |
| Wan2.1 I2V 14B | I2V | examples/training/finetune/wan_i2v_14B_480p/crush_smol/ |
| Wan2.1-Fun 1.3B InP | I2V | examples/training/finetune/Wan2.1-Fun-1.3B-InP/crush_smol/ |
| Wan2.1 VSA | T2V/I2V | examples/training/finetune/Wan2.1-VSA/Wan-Syn-Data/ |
Each example includes:
download_dataset.sh— download sample datapreprocess_*.sh— run preprocessingfinetune_*.sh— full finetune launcherfinetune_*_lora.sh— LoRA finetune launchervalidation.json— validation prompts