rotary_embedding_3d
¶
3D Rotary Position Embedding (RoPE) for video transformers.
Reference: https://arxiv.org/pdf/2104.09864.pdf
Classes¶
fastvideo.layers.rotary_embedding_3d.RotaryPositionalEmbedding3D
¶
Bases: Module
3D Rotary Positional Embedding for video transformers.
Splits the head dimension across temporal, height, and width dimensions, computing separate rotary embeddings for each and concatenating them.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
head_dim
|
int
|
Dimension of each attention head |
required |
base
|
float
|
Base value for exponential frequency |
10000.0
|
Source code in fastvideo/layers/rotary_embedding_3d.py
Functions¶
fastvideo.layers.rotary_embedding_3d.RotaryPositionalEmbedding3D.forward
¶
Apply 3D rotary positional embedding to queries and keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
q
|
Tensor
|
Query tensor [B, num_heads, seq_len, head_dim] |
required |
k
|
Tensor
|
Key tensor [B, num_heads, seq_len, head_dim] |
required |
grid_size
|
tuple[int, int, int]
|
(T, H, W) tuple of grid dimensions |
required |
Returns:
| Type | Description |
|---|---|
(q_rotated, k_rotated)
|
Rotated query and key tensors |
Source code in fastvideo/layers/rotary_embedding_3d.py
fastvideo.layers.rotary_embedding_3d.RotaryPositionalEmbedding3D.precompute_freqs_3d
¶
Precompute 3D rotary frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_size
|
tuple[int, int, int]
|
(num_frames, height, width) |
required |
Returns:
| Name | Type | Description |
|---|---|---|
freqs |
Tensor
|
[THW, head_dim] tensor of frequencies |
Source code in fastvideo/layers/rotary_embedding_3d.py
fastvideo.layers.rotary_embedding_3d.RotaryPositionalEmbedding3D.register_grid_size
¶
Functions¶
fastvideo.layers.rotary_embedding_3d.apply_rotary_emb_3d
¶
apply_rotary_emb_3d(q: Tensor, k: Tensor, rope_module: RotaryPositionalEmbedding3D, grid_size: tuple[int, int, int]) -> tuple[Tensor, Tensor]
Convenience function to apply 3D RoPE.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
q
|
Tensor
|
Query tensor [B, num_heads, seq_len, head_dim] |
required |
k
|
Tensor
|
Key tensor [B, num_heads, seq_len, head_dim] |
required |
rope_module
|
RotaryPositionalEmbedding3D
|
RotaryPositionalEmbedding3D module |
required |
grid_size
|
tuple[int, int, int]
|
(T, H, W) grid dimensions |
required |
Returns:
| Type | Description |
|---|---|
(q_rotated, k_rotated)
|
Rotated tensors |
Source code in fastvideo/layers/rotary_embedding_3d.py
fastvideo.layers.rotary_embedding_3d.broadcast
¶
Broadcast and concatenate tensors along a dimension.
Source code in fastvideo/layers/rotary_embedding_3d.py
fastvideo.layers.rotary_embedding_3d.rotate_half
¶
Rotate half the hidden dims of the input.