fastvideo.workflow.preprocess.components
#
Module Contents#
Classes#
Component for saving and writing Parquet datasets |
|
Functions#
Data#
API#
- class fastvideo.workflow.preprocess.components.ParquetDatasetSaver(flush_frequency: int, samples_per_file: int, schema_fields: list[str], record_creator: collections.abc.Callable[..., list[dict[str, Any]]], file_writer_fn: collections.abc.Callable | None = None)[source]#
Component for saving and writing Parquet datasets
Initialization
Initialize ParquetDatasetSaver
- Parameters:
schema_fields β schema fields list
record_creator β Function for creating records
file_writer_fn β Function for writing records to files, uses default implementation if None
- save_and_write_parquet_batch(batch: fastvideo.pipelines.pipeline_batch_info.PreprocessBatch, output_dir: str, extra_features: dict[str, Any] | None = None) None [source]#
Save and write Parquet dataset batch
- Parameters:
batch β PreprocessBatch containing video and metadata information
output_dir β Output directory
extra_features β Extra features
- Returns:
Number of processed samples
- class fastvideo.workflow.preprocess.components.PreprocessingDataValidator(max_height: int = 1024, max_width: int = 1024, max_h_div_w_ratio: float = 17 / 16, min_h_div_w_ratio: float = 8 / 16, num_frames: int = 16, train_fps: int = 24, speed_factor: float = 1.0, video_length_tolerance_range: float = 5.0, drop_short_ratio: float = 0.0, hw_aspect_threshold: float = 1.5)[source]#
Initialization
- class fastvideo.workflow.preprocess.components.VideoForwardBatchBuilder(seed: int)[source]#
Initialization
- fastvideo.workflow.preprocess.components.build_dataset(preprocess_config: fastvideo.configs.configs.PreprocessConfig, split: str, validator: collections.abc.Callable[[dict[str, Any]], bool]) datasets.Dataset [source]#