fastvideo.pipelines.stages.text_encoding

`fastvideo.pipelines.stages.text_encoding`#

Prompt encoding stages for diffusion pipelines.

This module contains implementations of prompt encoding stages for diffusion pipelines.

Stage for encoding text prompts into embeddings for diffusion models.

class fastvideo.pipelines.stages.text_encoding.TextEncodingStage(text_encoders, tokenizers)[source]#

Stage for encoding text prompts into embeddings for diffusion models.

This stage handles the encoding of text prompts into the embedding space expected by the diffusion model.

Initialization

Initialize the prompt encoding stage.

Parameters:

Encode plain text using selected text encoder(s) and return embeddings.

Parameters:

text – A single string or a list of strings to encode.
fastvideo_args – The inference arguments providing pipeline config, including tokenizer and encoder settings, preprocess and postprocess functions.
encoder_index – Encoder selector by index. Accepts an int or list of ints.
return_attention_mask – If True, also return attention masks for each selected encoder.
return_type – “list” (default) returns a list aligned with selection; “dict” returns a dict keyed by encoder index as a string; “stack” stacks along a new first dimension (requires matching shapes).
device – Optional device override for inputs; defaults to local torch device.
dtype – Optional dtype to cast returned embeddings to.
max_length – Optional per-call tokenizer override.
truncation – Optional per-call tokenizer override.
padding – Optional per-call tokenizer override.

Returns:

list: List[Tensor] or (List[Tensor], List[Tensor])
dict: Dict[str, Tensor] or (Dict[str, Tensor], Dict[str, Tensor])
stack: Tensor of shape [num_encoders, …] or a tuple with stacked attention masks

Return type:

Depending on return_type and return_attention_mask

Encode the prompt into text encoder hidden states.

Parameters:

Returns:

The batch with encoded prompt embeddings.

verify_input(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) → fastvideo.pipelines.stages.validators.VerificationResult[source]#: Verify text encoding stage inputs.

verify_output(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) → fastvideo.pipelines.stages.validators.VerificationResult[source]#: Verify text encoding stage outputs.