preprocess
¶
Modules¶
fastvideo.workflow.preprocess.components
¶
Classes¶
fastvideo.workflow.preprocess.components.ParquetDatasetSaver
¶
ParquetDatasetSaver(flush_frequency: int, samples_per_file: int, schema: Schema, record_creator: Callable[..., list[dict[str, Any]]])
Component for saving and writing Parquet datasets using shared parquet_io.
Source code in fastvideo/workflow/preprocess/components.py
Functions¶
fastvideo.workflow.preprocess.components.ParquetDatasetSaver.clean_up
¶ fastvideo.workflow.preprocess.components.ParquetDatasetSaver.flush_tables
¶flush_tables(write_remainder: bool = False)
Flush buffered records to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_dir
|
Directory where parquet files are written. Kept for API symmetry (writer already configured with this path). |
required | |
write_remainder
|
bool
|
If True, also write any leftover rows smaller than
|
False
|
Source code in fastvideo/workflow/preprocess/components.py
fastvideo.workflow.preprocess.components.ParquetDatasetSaver.save_and_write_parquet_batch
¶save_and_write_parquet_batch(batch: PreprocessBatch, output_dir: str, extra_features: dict[str, Any] | None = None) -> None
Save and write Parquet dataset batch
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
PreprocessBatch containing video and metadata information |
required |
output_dir
|
str
|
Output directory |
required |
extra_features
|
dict[str, Any] | None
|
Extra features |
None
|
Returns:
| Type | Description |
|---|---|
None
|
Number of processed samples |
Source code in fastvideo/workflow/preprocess/components.py
fastvideo.workflow.preprocess.components.PreprocessingDataValidator
¶
PreprocessingDataValidator(max_height: int = 1024, max_width: int = 1024, max_h_div_w_ratio: float = 17 / 16, min_h_div_w_ratio: float = 8 / 16, num_frames: int = 16, train_fps: int = 24, speed_factor: float = 1.0, video_length_tolerance_range: float = 5.0, drop_short_ratio: float = 0.0, hw_aspect_threshold: float = 1.5)
Source code in fastvideo/workflow/preprocess/components.py
Functions¶
fastvideo.workflow.preprocess.components.PreprocessingDataValidator.__call__
¶Validate whether the preprocessing data batch is valid.