fastvideo.pipelines.stages.image_encoding#

Image encoding stages for I2V diffusion pipelines.

This module contains implementations of image encoding stages for diffusion pipelines.

Module Contents#

Classes#

ImageEncodingStage

Stage for encoding image prompts into embeddings for diffusion models.

ImageVAEEncodingStage

Stage for encoding pixel representations into latent space.

Data#

API#

class fastvideo.pipelines.stages.image_encoding.ImageEncodingStage(image_encoder, image_processor)[source]#

Bases: fastvideo.pipelines.stages.base.PipelineStage

Stage for encoding image prompts into embeddings for diffusion models.

This stage handles the encoding of image prompts into the embedding space expected by the diffusion model.

Initialization

Initialize the prompt encoding stage.

Parameters:
  • enable_logging – Whether to enable logging for this stage.

  • is_secondary – Whether this is a secondary image encoder.

forward(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.pipeline_batch_info.ForwardBatch[source]#

Encode the prompt into image encoder hidden states.

Parameters:
  • batch – The current batch information.

  • fastvideo_args – The inference arguments.

Returns:

The batch with encoded prompt embeddings.

verify_input(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult[source]#

Verify image encoding stage inputs.

verify_output(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult[source]#

Verify image encoding stage outputs.

class fastvideo.pipelines.stages.image_encoding.ImageVAEEncodingStage(vae: fastvideo.models.vaes.common.ParallelTiledVAE)[source]#

Bases: fastvideo.pipelines.stages.base.PipelineStage

Stage for encoding pixel representations into latent space.

This stage handles the encoding of pixel representations into the final input format (e.g., latents).

Initialization

forward(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.pipeline_batch_info.ForwardBatch[source]#

Encode pixel representations into latent space.

Parameters:
  • batch – The current batch information.

  • fastvideo_args – The inference arguments.

Returns:

The batch with encoded outputs.

preprocess(image: torch.Tensor | PIL.Image.Image, vae_scale_factor: int, height: int | None = None, width: int | None = None, resize_mode: str = 'default') torch.Tensor[source]#
retrieve_latents(encoder_output: torch.Tensor, generator: torch.Generator | None = None, sample_mode: str = 'sample')[source]#
verify_input(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult[source]#

Verify encoding stage inputs.

verify_output(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) fastvideo.pipelines.stages.validators.VerificationResult[source]#

Verify encoding stage outputs.

fastvideo.pipelines.stages.image_encoding.logger[source]#

β€˜init_logger(…)’