fastvideo.pipelines.preprocess.preprocess_pipeline_base#

Module Contents#

Classes#

BasePreprocessPipeline

Base class for preprocessing pipelines that handles common functionality.

Data#

API#

class fastvideo.pipelines.preprocess.preprocess_pipeline_base.BasePreprocessPipeline(model_path: str, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs | fastvideo.fastvideo_args.TrainingArgs, required_config_modules: list[str] | None = None, loaded_modules: dict[str, torch.nn.Module] | None = None)[source]#

Bases: fastvideo.pipelines.composed_pipeline_base.ComposedPipelineBase

Base class for preprocessing pipelines that handles common functionality.

Initialization

Initialize the pipeline. After init, the pipeline should be ready to use. The pipeline should be stateless and not hold any batch state.

create_pipeline_stages(fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs)[source]#

Set up pipeline stages with proper dependency injection.

create_record(video_name: str, vae_latent: numpy.ndarray, text_embedding: numpy.ndarray, valid_data: dict[str, Any], idx: int, extra_features: dict[str, Any] | None = None) dict[str, Any][source]#

Create a record for the Parquet dataset.

create_record_for_schema(preprocess_batch: fastvideo.dataset.preprocessing_datasets.PreprocessBatch, schema: pyarrow.Schema, strict: bool = False) dict[str, Any][source]#

Create a record for the Parquet dataset using a generic schema-based approach.

Parameters:
  • preprocess_batch – The batch containing the data to extract

  • schema – PyArrow schema defining the expected fields

  • strict – If True, raises an exception when required fields are missing or unfilled

Returns:

Dictionary record matching the schema

Raises:

ValueError – If strict=True and required fields are missing or unfilled

forward(batch: fastvideo.pipelines.pipeline_batch_info.ForwardBatch, fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs, args)[source]#
get_extra_features(valid_data: dict[str, Any], fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs) dict[str, Any][source]#

Get additional features specific to the pipeline type. Override in subclasses.

abstract get_pyarrow_schema() pyarrow.Schema[source]#

Return the PyArrow schema for this pipeline. Must be overridden.

get_schema_fields() list[str][source]#

Get the schema fields for the pipeline type.

preprocess_video_and_text(fastvideo_args: fastvideo.fastvideo_args.FastVideoArgs, args)[source]#
fastvideo.pipelines.preprocess.preprocess_pipeline_base.logger[source]#

β€˜init_logger(…)’