fastvideo.v1.attention.backends.abstract
#
Module Contents#
Classes#
Abstract class for attention backends. |
|
Attention metadata for prefill and decode batched together. |
|
Abstract class for attention metadata builders. |
Data#
API#
- class fastvideo.v1.attention.backends.abstract.AttentionBackend[source]#
Bases:
abc.ABC
Abstract class for attention backends.
- abstract static get_builder_cls() Type[fastvideo.v1.attention.backends.abstract.AttentionMetadataBuilder] [source]#
- abstract static get_impl_cls() Type[fastvideo.v1.attention.backends.abstract.AttentionImpl] [source]#
- abstract static get_metadata_cls() Type[fastvideo.v1.attention.backends.abstract.AttentionMetadata] [source]#
- class fastvideo.v1.attention.backends.abstract.AttentionImpl(num_heads: int, head_size: int, softmax_scale: float, causal: bool = False, num_kv_heads: Optional[int] = None, prefix: str = '', **extra_impl_args)[source]#
Bases:
abc.ABC
,typing.Generic
[fastvideo.v1.attention.backends.abstract.T
]- abstract forward(query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.T) torch.Tensor [source]#
- postprocess_output(output: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.T) torch.Tensor [source]#
Postprocess the output tensor after the attention operation.
Default implementation returns the tensor unchanged. Subclasses can override this to implement custom postprocessing like untiling, scaling, or other transformations.
Called BEFORE all_to_all for distributed attention
- Parameters:
output β The output tensor from the attention operation
attn_metadata β Metadata for the attention operation
- Returns:
Postprocessed output tensor
- preprocess_qkv(qkv: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.T) torch.Tensor [source]#
Preprocess QKV tensor before performing attention operation.
Default implementation returns the tensor unchanged. Subclasses can override this to implement custom preprocessing like reshaping, tiling, scaling, or other transformations.
Called AFTER all_to_all for distributed attention
- Parameters:
qkv β The query-key-value tensor
attn_metadata β Metadata for the attention operation
- Returns:
Processed QKV tensor
- class fastvideo.v1.attention.backends.abstract.AttentionLayer[source]#
Bases:
typing.Protocol
- forward(query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, kv_cache: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.AttentionMetadata) torch.Tensor [source]#
- class fastvideo.v1.attention.backends.abstract.AttentionMetadata[source]#
Attention metadata for prefill and decode batched together.
- class fastvideo.v1.attention.backends.abstract.AttentionMetadataBuilder[source]#
Bases:
abc.ABC
,typing.Generic
[fastvideo.v1.attention.backends.abstract.T
]Abstract class for attention metadata builders.
Initialization
Create the builder, remember some configuration and parameters.
- abstract build(current_timestep: int, forward_batch: fastvideo.v1.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.v1.fastvideo_args.FastVideoArgs) fastvideo.v1.attention.backends.abstract.T [source]#
Build attention metadata with on-device tensors.