fastvideo.v1.attention.layer

`fastvideo.v1.attention.layer`#

Module Contents#

Classes#

`DistributedAttention`	Distributed attention layer.
`DistributedAttention_VSA`	Distributed attention layer with VSA support.
`LocalAttention`	Attention layer.

API#

class fastvideo.v1.attention.layer.DistributedAttention(num_heads: int, head_size: int, num_kv_heads: int | None = None, softmax_scale: float | None = None, causal: bool = False, supported_attention_backends: tuple[fastvideo.v1.platforms.AttentionBackendEnum, ...] | None = None, prefix: str = '', **extra_impl_args)[source]#

Bases: torch.nn.Module

Distributed attention layer.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, replicated_q: torch.Tensor | None = None, replicated_k: torch.Tensor | None = None, replicated_v: torch.Tensor | None = None) → tuple[torch.Tensor, torch.Tensor | None][source]#

Forward pass for distributed attention.

Parameters:

q (torch.Tensor) – Query tensor [batch_size, seq_len, num_heads, head_dim]
k (torch.Tensor) – Key tensor [batch_size, seq_len, num_heads, head_dim]
v (torch.Tensor) – Value tensor [batch_size, seq_len, num_heads, head_dim]
replicated_q (Optional[torch.Tensor]) – Replicated query tensor, typically for text tokens
replicated_k (Optional[torch.Tensor]) – Replicated key tensor
replicated_v (Optional[torch.Tensor]) – Replicated value tensor

Returns:

A tuple containing: - o (torch.Tensor): Output tensor after attention for the main sequence - replicated_o (Optional[torch.Tensor]): Output tensor for replicated tokens, if provided

Return type:

Tuple[torch.Tensor, Optional[torch.Tensor]]

class fastvideo.v1.attention.layer.DistributedAttention_VSA(num_heads: int, head_size: int, num_kv_heads: int | None = None, softmax_scale: float | None = None, causal: bool = False, supported_attention_backends: tuple[fastvideo.v1.platforms.AttentionBackendEnum, ...] | None = None, prefix: str = '', **extra_impl_args)[source]#

Bases: fastvideo.v1.attention.layer.DistributedAttention

Distributed attention layer with VSA support.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, replicated_q: torch.Tensor | None = None, replicated_k: torch.Tensor | None = None, replicated_v: torch.Tensor | None = None, gate_compress: torch.Tensor | None = None) → tuple[torch.Tensor, torch.Tensor | None][source]#

Forward pass for distributed attention.

Parameters:

q (torch.Tensor) – Query tensor [batch_size, seq_len, num_heads, head_dim]
k (torch.Tensor) – Key tensor [batch_size, seq_len, num_heads, head_dim]
v (torch.Tensor) – Value tensor [batch_size, seq_len, num_heads, head_dim]
gate_compress (torch.Tensor) – Gate compress tensor [batch_size, seq_len, num_heads, head_dim]
replicated_q (Optional[torch.Tensor]) – Replicated query tensor, typically for text tokens
replicated_k (Optional[torch.Tensor]) – Replicated key tensor
replicated_v (Optional[torch.Tensor]) – Replicated value tensor

Returns:

A tuple containing: - o (torch.Tensor): Output tensor after attention for the main sequence - replicated_o (Optional[torch.Tensor]): Output tensor for replicated tokens, if provided

Return type:

Tuple[torch.Tensor, Optional[torch.Tensor]]

class fastvideo.v1.attention.layer.LocalAttention(num_heads: int, head_size: int, num_kv_heads: int | None = None, softmax_scale: float | None = None, causal: bool = False, supported_attention_backends: tuple[fastvideo.v1.platforms.AttentionBackendEnum, ...] | None = None, **extra_impl_args)[source]#

Bases: torch.nn.Module

Attention layer.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor) → torch.Tensor[source]#

Apply local attention between query, key and value tensors.

Parameters:

q (torch.Tensor) – Query tensor of shape [batch_size, seq_len, num_heads, head_dim]
k (torch.Tensor) – Key tensor of shape [batch_size, seq_len, num_heads, head_dim]
v (torch.Tensor) – Value tensor of shape [batch_size, seq_len, num_heads, head_dim]

Returns:

Output tensor after local attention

Return type:

torch.Tensor

fastvideo.v1.attention.layer

Contents

fastvideo.v1.attention.layer#

Module Contents#

Classes#

API#

`fastvideo.v1.attention.layer`#