πŸ”§ Installation#

You can install the Video Sparse Attention package using

git submodule update --init --recursive
python setup_vsa.py install

Building from Source#

We support H100 (via ThunderKittens) and any other GPU (via Triton) for VSA.

First, install C++20 for ThunderKittens (if using H100):

sudo apt update
sudo apt install gcc-11 g++-11

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100 --slave /usr/bin/g++ g++ /usr/bin/g++-11

sudo apt update
sudo apt install clang-11

Set up CUDA environment (if using CUDA 12.4):

export CUDA_HOME=/usr/local/cuda-12.4
export PATH=${CUDA_HOME}/bin:${PATH} 
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

Install VSA:

cd csrc/attn/
git submodule update --init --recursive
python setup_vsa.py install

πŸ§ͺ Test#

python csrc/attn/tests/test_vsa.py

πŸ“‹ Usage#

from vsa import video_sparse_attn

# q, k, v: [batch_size, num_heads, seq_len, head_dim]
# variable_block_sizes: [num_blocks] - number of valid tokens in each block
# topk: int - number of top-k blocks to attend to
# block_size: int or tuple of 3 ints - size of each block (default: 64 tokens)
# compress_attn_weight: optional weight for compressed attention branch

output = video_sparse_attn(q, k, v, variable_block_sizes, topk, block_size, compress_attn_weight)

πŸš€Inference#

bash scripts/inference/v1_inference_wan_VSA.sh