π§ Installation#
You can install the Video Sparse Attention package using
git submodule update --init --recursive
python setup_vsa.py install
Building from Source#
We support H100 (via ThunderKittens) and any other GPU (via Triton) for VSA.
First, install C++20 for ThunderKittens (if using H100):
sudo apt update
sudo apt install gcc-11 g++-11
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100 --slave /usr/bin/g++ g++ /usr/bin/g++-11
sudo apt update
sudo apt install clang-11
Set up CUDA environment (if using CUDA 12.4):
export CUDA_HOME=/usr/local/cuda-12.4
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
Install VSA:
cd csrc/attn/
git submodule update --init --recursive
python setup_vsa.py install
π§ͺ Test#
python csrc/attn/tests/test_vsa.py
π Usage#
from vsa import video_sparse_attn
# q, k, v: [batch_size, num_heads, seq_len, head_dim]
# variable_block_sizes: [num_blocks] - number of valid tokens in each block
# topk: int - number of top-k blocks to attend to
# block_size: int or tuple of 3 ints - size of each block (default: 64 tokens)
# compress_attn_weight: optional weight for compressed attention branch
output = video_sparse_attn(q, k, v, variable_block_sizes, topk, block_size, compress_attn_weight)
πInference#
bash scripts/inference/v1_inference_wan_VSA.sh