layer
¶
Classes¶
fastvideo.attention.layer.DistributedAttention
¶
DistributedAttention(num_heads: int, head_size: int, num_kv_heads: int | None = None, softmax_scale: float | None = None, causal: bool = False, supported_attention_backends: tuple[AttentionBackendEnum, ...] | None = None, prefix: str = '', **extra_impl_args)
Bases: Module
Distributed attention layer.
Source code in fastvideo/attention/layer.py
Functions¶
fastvideo.attention.layer.DistributedAttention.forward
¶
forward(q: Tensor, k: Tensor, v: Tensor, replicated_q: Tensor | None = None, replicated_k: Tensor | None = None, replicated_v: Tensor | None = None) -> tuple[Tensor, Tensor | None]
Forward pass for distributed attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
q
|
Tensor
|
Query tensor [batch_size, seq_len, num_heads, head_dim] |
required |
k
|
Tensor
|
Key tensor [batch_size, seq_len, num_heads, head_dim] |
required |
v
|
Tensor
|
Value tensor [batch_size, seq_len, num_heads, head_dim] |
required |
replicated_q
|
Optional[Tensor]
|
Replicated query tensor, typically for text tokens |
None
|
replicated_k
|
Optional[Tensor]
|
Replicated key tensor |
None
|
replicated_v
|
Optional[Tensor]
|
Replicated value tensor |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor | None]
|
Tuple[torch.Tensor, Optional[torch.Tensor]]: A tuple containing: - o (torch.Tensor): Output tensor after attention for the main sequence - replicated_o (Optional[torch.Tensor]): Output tensor for replicated tokens, if provided |
Source code in fastvideo/attention/layer.py
fastvideo.attention.layer.DistributedAttention_VSA
¶
DistributedAttention_VSA(num_heads: int, head_size: int, num_kv_heads: int | None = None, softmax_scale: float | None = None, causal: bool = False, supported_attention_backends: tuple[AttentionBackendEnum, ...] | None = None, prefix: str = '', **extra_impl_args)
Bases: DistributedAttention
Distributed attention layer with VSA support.
Source code in fastvideo/attention/layer.py
Functions¶
fastvideo.attention.layer.DistributedAttention_VSA.forward
¶
forward(q: Tensor, k: Tensor, v: Tensor, replicated_q: Tensor | None = None, replicated_k: Tensor | None = None, replicated_v: Tensor | None = None, gate_compress: Tensor | None = None) -> tuple[Tensor, Tensor | None]
Forward pass for distributed attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
q
|
Tensor
|
Query tensor [batch_size, seq_len, num_heads, head_dim] |
required |
k
|
Tensor
|
Key tensor [batch_size, seq_len, num_heads, head_dim] |
required |
v
|
Tensor
|
Value tensor [batch_size, seq_len, num_heads, head_dim] |
required |
gate_compress
|
Tensor
|
Gate compress tensor [batch_size, seq_len, num_heads, head_dim] |
None
|
replicated_q
|
Optional[Tensor]
|
Replicated query tensor, typically for text tokens |
None
|
replicated_k
|
Optional[Tensor]
|
Replicated key tensor |
None
|
replicated_v
|
Optional[Tensor]
|
Replicated value tensor |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor | None]
|
Tuple[torch.Tensor, Optional[torch.Tensor]]: A tuple containing: - o (torch.Tensor): Output tensor after attention for the main sequence - replicated_o (Optional[torch.Tensor]): Output tensor for replicated tokens, if provided |
Source code in fastvideo/attention/layer.py
fastvideo.attention.layer.LocalAttention
¶
LocalAttention(num_heads: int, head_size: int, num_kv_heads: int | None = None, softmax_scale: float | None = None, causal: bool = False, supported_attention_backends: tuple[AttentionBackendEnum, ...] | None = None, **extra_impl_args)
Bases: Module
Attention layer.
Source code in fastvideo/attention/layer.py
Functions¶
fastvideo.attention.layer.LocalAttention.forward
¶
Apply local attention between query, key and value tensors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
q
|
Tensor
|
Query tensor of shape [batch_size, seq_len, num_heads, head_dim] |
required |
k
|
Tensor
|
Key tensor of shape [batch_size, seq_len, num_heads, head_dim] |
required |
v
|
Tensor
|
Value tensor of shape [batch_size, seq_len, num_heads, head_dim] |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Output tensor after local attention |