Optimizations

Optimizations#

This page describes the various options for speeding up generation times in FastVideo.

Table of Contents#

Optimized Attention Backends
Caching Techniques
- TeaCache

Attention Backends#

Available Backends#

Torch SDPA: FASTVIDEO_ATTENTION_BACKEND=TORCH_SDPA
Flash Attention 2 and 3: FASTVIDEO_ATTENTION_BACKEND=FLASH_ATTN
Sliding Tile Attention: FASTVIDEO_ATTENTION_BACKEND=SLIDING_TILE_ATTN
Sage Attention: FASTVIDEO_ATTENTION_BACKEND=SAGE_ATTN

Configuring Backends#

There are two ways to configure the attention backend in FastVideo.

1. In Python#

In python, set the FASTVIDEO_ATTENTION_BACKEND environment variable before instantiating VideoGenerator like this:

os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "SLIDING_TILE_ATTN"

2. In CLI#

You can also set the environment variable on the command line:

FASTVIDEO_ATTENTION_BACKEND=SAGE_ATTN python example.py

Flash Attention#

FLASH_ATTN

We recommend always installing Flash Attention 2:

pip install flash-attn==2.7.4.post1 --no-build-isolation

And if using a Hopper+ GPU (ie H100), installing Flash Attention 3 by compiling it from source (takes about 10 minutes for me):

git clone https://github.com/Dao-AILab/flash-attention.git && cd flash-attention

cd hopper
pip install ninja 
python setup.py install

Note

FastVideo will automatically detect and use FA3 if it is installed when using FLASH_ATTN backend.

Sliding Tile Attention#

SLIDING_TILE_ATTN

pip install st_attn==0.0.4

Then download STA mask strategy from Hugging Face

python scripts/huggingface/download_hf.py --repo_id=FastVideo/STA_Mask_Strategy --local_dir=assets/ --repo_type=dataset

Please see this page for more installation instructions.

Sage Attention#

SAGE_ATTN

To use SageAttention 2.1.1, please compile from source:

git clone https://github.com/thu-ml/SageAttention.git
cd sageattention 
python setup.py install  # or pip install -e .

Teacache#

TeaCache is an optimization technique supported in FastVideo that can significantly speed up video generation by skipping redundant calculations across diffusion steps. This guide explains how to enable and configure TeaCache for optimal performance in FastVideo.

What is TeaCache?#

See the official TeaCache repo and their paper for more details.

How to Enable TeaCache#

Enabling TeaCache is straightforward - simply add the enable_teacache=True parameter to your generate_video() call:

# ... previous code
generator.generate_video(
    prompt="Your prompt here",
    sampling_param=params,
    enable_teacache=True
)
# more code ...

Complete Example#

At the bottom is a complete example of using TeaCache for faster video generation. You can run it using the following command:

python examples/inference/optimizations/teacache_example.py

Advanced Configuration#

While TeaCache works well with default settings, you can fine-tune its behavior by adjusting the threshold value:

Lower threshold values (e.g., 0.1) will result in more skipped calculations and faster generation with slightly more potential for quality degradation
Higher threshold values (e.g., 0.15-0.23) will skip fewer calculations but maintain quality closer to the original

Note that the optimal threshold depends on your specific model and content.

Benchmarking different optimizations#

To benchmark the performance improvement, try generating the same video with and without TeaCache enabled and compare the generation times:

# Without TeaCache
start_time = time.perf_counter()
generator.generate_video(prompt="Your prompt", enable_teacache=False)
standard_time = time.perf_counter() - start_time

# With TeaCache
start_time = time.perf_counter()
generator.generate_video(prompt="Your prompt", enable_teacache=True)
teacache_time = time.perf_counter() - start_time

print(f"Standard generation: {standard_time:.2f} seconds")
print(f"TeaCache generation: {teacache_time:.2f} seconds")
print(f"Speedup: {standard_time/teacache_time:.2f}x")

Note: If you want to benchmark different attention backends, you’ll need to reinstantiate VideoGenerator.