fastvideo.v1.attention.backends.abstract#

Module Contents#

Classes#

AttentionBackend

Abstract class for attention backends.

AttentionImpl

AttentionLayer

AttentionMetadata

Attention metadata for prefill and decode batched together.

AttentionMetadataBuilder

Abstract class for attention metadata builders.

Data#

T

API#

class fastvideo.v1.attention.backends.abstract.AttentionBackend[source]#

Bases: abc.ABC

Abstract class for attention backends.

accept_output_buffer: bool[source]#

False

abstract static get_builder_cls() Type[fastvideo.v1.attention.backends.abstract.AttentionMetadataBuilder][source]#
abstract static get_impl_cls() Type[fastvideo.v1.attention.backends.abstract.AttentionImpl][source]#
abstract static get_metadata_cls() Type[fastvideo.v1.attention.backends.abstract.AttentionMetadata][source]#
abstract static get_name() str[source]#
class fastvideo.v1.attention.backends.abstract.AttentionImpl(num_heads: int, head_size: int, softmax_scale: float, causal: bool = False, num_kv_heads: Optional[int] = None, prefix: str = '', **extra_impl_args)[source]#

Bases: abc.ABC, typing.Generic[fastvideo.v1.attention.backends.abstract.T]

abstract forward(query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.T) torch.Tensor[source]#
postprocess_output(output: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.T) torch.Tensor[source]#

Postprocess the output tensor after the attention operation.

Default implementation returns the tensor unchanged. Subclasses can override this to implement custom postprocessing like untiling, scaling, or other transformations.

Called BEFORE all_to_all for distributed attention

Parameters:
  • output – The output tensor from the attention operation

  • attn_metadata – Metadata for the attention operation

Returns:

Postprocessed output tensor

preprocess_qkv(qkv: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.T) torch.Tensor[source]#

Preprocess QKV tensor before performing attention operation.

Default implementation returns the tensor unchanged. Subclasses can override this to implement custom preprocessing like reshaping, tiling, scaling, or other transformations.

Called AFTER all_to_all for distributed attention

Parameters:
  • qkv – The query-key-value tensor

  • attn_metadata – Metadata for the attention operation

Returns:

Processed QKV tensor

class fastvideo.v1.attention.backends.abstract.AttentionLayer[source]#

Bases: typing.Protocol

forward(query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, kv_cache: torch.Tensor, attn_metadata: fastvideo.v1.attention.backends.abstract.AttentionMetadata) torch.Tensor[source]#
class fastvideo.v1.attention.backends.abstract.AttentionMetadata[source]#

Attention metadata for prefill and decode batched together.

asdict_zerocopy(skip_fields: Optional[Set[str]] = None) Dict[str, Any][source]#

Similar to dataclasses.asdict, but avoids deepcopying.

current_timestep: int[source]#

None

class fastvideo.v1.attention.backends.abstract.AttentionMetadataBuilder[source]#

Bases: abc.ABC, typing.Generic[fastvideo.v1.attention.backends.abstract.T]

Abstract class for attention metadata builders.

Initialization

Create the builder, remember some configuration and parameters.

abstract build(current_timestep: int, forward_batch: fastvideo.v1.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.v1.fastvideo_args.FastVideoArgs) fastvideo.v1.attention.backends.abstract.T[source]#

Build attention metadata with on-device tensors.

abstract prepare() None[source]#

Prepare for one batch.

fastvideo.v1.attention.backends.abstract.T[source]#

β€˜TypeVar(…)’