fastvideo.workflow.preprocess.components#

Module Contents#

Classes#

ParquetDatasetSaver

Component for saving and writing Parquet datasets

PreprocessingDataValidator

VideoForwardBatchBuilder

Functions#

Data#

API#

class fastvideo.workflow.preprocess.components.ParquetDatasetSaver(flush_frequency: int, samples_per_file: int, schema_fields: list[str], record_creator: collections.abc.Callable[..., list[dict[str, Any]]], file_writer_fn: collections.abc.Callable | None = None)[source]#

Component for saving and writing Parquet datasets

Initialization

Initialize ParquetDatasetSaver

Parameters:
  • schema_fields – schema fields list

  • record_creator – Function for creating records

  • file_writer_fn – Function for writing records to files, uses default implementation if None

clean_up() None[source]#

Clean up all tables

flush_tables(output_dir: str)[source]#

Flush collected tables to disk

save_and_write_parquet_batch(batch: fastvideo.pipelines.pipeline_batch_info.PreprocessBatch, output_dir: str, extra_features: dict[str, Any] | None = None) None[source]#

Save and write Parquet dataset batch

Parameters:
  • batch – PreprocessBatch containing video and metadata information

  • output_dir – Output directory

  • extra_features – Extra features

Returns:

Number of processed samples

class fastvideo.workflow.preprocess.components.PreprocessingDataValidator(max_height: int = 1024, max_width: int = 1024, max_h_div_w_ratio: float = 17 / 16, min_h_div_w_ratio: float = 8 / 16, num_frames: int = 16, train_fps: int = 24, speed_factor: float = 1.0, video_length_tolerance_range: float = 5.0, drop_short_ratio: float = 0.0, hw_aspect_threshold: float = 1.5)[source]#

Initialization

add_validator(name: str, validator: collections.abc.Callable[[dict[str, Any]], bool]) None[source]#
log_validation_stats()[source]#
register_validators() None[source]#
class fastvideo.workflow.preprocess.components.VideoForwardBatchBuilder(seed: int)[source]#

Initialization

fastvideo.workflow.preprocess.components.build_dataset(preprocess_config: fastvideo.configs.configs.PreprocessConfig, split: str, validator: collections.abc.Callable[[dict[str, Any]], bool]) datasets.Dataset[source]#
fastvideo.workflow.preprocess.components.logger[source]#

β€˜init_logger(…)’