fastvideo.pipelines.stages.text_encoding
#
Prompt encoding stages for diffusion pipelines.
This module contains implementations of prompt encoding stages for diffusion pipelines.
Module Contents#
Classes#
Stage for encoding text prompts into embeddings for diffusion models. |
Data#
API#
- class fastvideo.pipelines.stages.text_encoding.TextEncodingStage(text_encoders, tokenizers)[source]#
Bases:
fastvideo.pipelines.stages.base.PipelineStage
Stage for encoding text prompts into embeddings for diffusion models.
This stage handles the encoding of text prompts into the embedding space expected by the diffusion model.
Initialization
Initialize the prompt encoding stage.
- Parameters:
enable_logging β Whether to enable logging for this stage.
is_secondary β Whether this is a secondary text encoder.
- encode_text(text: str | list[str], fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs, encoder_index: int | list[int] | None = None, return_attention_mask: bool = False, return_type: str = 'list', device: torch.device | str | None = None, dtype: torch.dtype | None = None, max_length: int | None = None, truncation: bool | None = None, padding: bool | str | None = None)[source]#
Encode plain text using selected text encoder(s) and return embeddings.
- Parameters:
text β A single string or a list of strings to encode.
fastvideo_args β The inference arguments providing pipeline config, including tokenizer and encoder settings, preprocess and postprocess functions.
encoder_index β Encoder selector by index. Accepts an int or list of ints.
return_attention_mask β If True, also return attention masks for each selected encoder.
return_type β βlistβ (default) returns a list aligned with selection; βdictβ returns a dict keyed by encoder index as a string; βstackβ stacks along a new first dimension (requires matching shapes).
device β Optional device override for inputs; defaults to local torch device.
dtype β Optional dtype to cast returned embeddings to.
max_length β Optional per-call tokenizer override.
truncation β Optional per-call tokenizer override.
padding β Optional per-call tokenizer override.
- Returns:
list: List[Tensor] or (List[Tensor], List[Tensor])
dict: Dict[str, Tensor] or (Dict[str, Tensor], Dict[str, Tensor])
stack: Tensor of shape [num_encoders, β¦] or a tuple with stacked attention masks
- Return type:
Depending on return_type and return_attention_mask
- forward(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.pipeline_batch_info.ForwardBatch [source]#
Encode the prompt into text encoder hidden states.
- Parameters:
batch β The current batch information.
fastvideo_args β The inference arguments.
- Returns:
The batch with encoded prompt embeddings.
- verify_input(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult [source]#
Verify text encoding stage inputs.
- verify_output(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult [source]#
Verify text encoding stage outputs.