fastvideo.pipelines.stages.text_encoding#

Prompt encoding stages for diffusion pipelines.

This module contains implementations of prompt encoding stages for diffusion pipelines.

Module Contents#

Classes#

TextEncodingStage

Stage for encoding text prompts into embeddings for diffusion models.

Data#

API#

class fastvideo.pipelines.stages.text_encoding.TextEncodingStage(text_encoders, tokenizers)[source]#

Bases: fastvideo.pipelines.stages.base.PipelineStage

Stage for encoding text prompts into embeddings for diffusion models.

This stage handles the encoding of text prompts into the embedding space expected by the diffusion model.

Initialization

Initialize the prompt encoding stage.

Parameters:
  • enable_logging – Whether to enable logging for this stage.

  • is_secondary – Whether this is a secondary text encoder.

encode_text(text: str | list[str], fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs, encoder_index: int | list[int] | None = None, return_attention_mask: bool = False, return_type: str = 'list', device: torch.device | str | None = None, dtype: torch.dtype | None = None, max_length: int | None = None, truncation: bool | None = None, padding: bool | str | None = None)[source]#

Encode plain text using selected text encoder(s) and return embeddings.

Parameters:
  • text – A single string or a list of strings to encode.

  • fastvideo_args – The inference arguments providing pipeline config, including tokenizer and encoder settings, preprocess and postprocess functions.

  • encoder_index – Encoder selector by index. Accepts an int or list of ints.

  • return_attention_mask – If True, also return attention masks for each selected encoder.

  • return_type – β€œlist” (default) returns a list aligned with selection; β€œdict” returns a dict keyed by encoder index as a string; β€œstack” stacks along a new first dimension (requires matching shapes).

  • device – Optional device override for inputs; defaults to local torch device.

  • dtype – Optional dtype to cast returned embeddings to.

  • max_length – Optional per-call tokenizer override.

  • truncation – Optional per-call tokenizer override.

  • padding – Optional per-call tokenizer override.

Returns:

  • list: List[Tensor] or (List[Tensor], List[Tensor])

  • dict: Dict[str, Tensor] or (Dict[str, Tensor], Dict[str, Tensor])

  • stack: Tensor of shape [num_encoders, …] or a tuple with stacked attention masks

Return type:

Depending on return_type and return_attention_mask

forward(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.pipeline_batch_info.ForwardBatch[source]#

Encode the prompt into text encoder hidden states.

Parameters:
  • batch – The current batch information.

  • fastvideo_args – The inference arguments.

Returns:

The batch with encoded prompt embeddings.

verify_input(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult[source]#

Verify text encoding stage inputs.

verify_output(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult[source]#

Verify text encoding stage outputs.

fastvideo.pipelines.stages.text_encoding.logger[source]#

β€˜init_logger(…)’