fastvideo.v1.dataset.preprocessing_datasets
#
Module Contents#
Classes#
Stage for validating data items. |
|
Abstract base class for dataset filtering stages. |
|
Abstract base class for dataset processing stages. |
|
Stage for temporal frame sampling and indexing. |
|
Stage for image data transformation. |
|
Batch information for dataset processing stages. |
|
Stage for filtering data items based on resolution constraints. |
|
Stage for text tokenization and encoding. |
|
Merged dataset for video and caption data with stage-based processing. Assumes that data_merge_path is a txt file with the following format: <folder_path>,<json_file_path> |
|
Stage for video data transformation. |
Data#
API#
- class fastvideo.v1.dataset.preprocessing_datasets.DataValidationStage[source]#
Bases:
fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage
Stage for validating data items.
- process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Process does nothing for validation - filtering is handled by should_keep.
- should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool [source]#
Validate data item.
- Parameters:
batch β Dataset batch to validate
- Returns:
True if valid, False if invalid
- class fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage[source]#
Bases:
abc.ABC
Abstract base class for dataset filtering stages.
These stages can filter out items during metadata processing.
- abstract process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Process the dataset batch (for non-filtering operations).
- Parameters:
batch β Dataset batch to process
**kwargs β Additional processing parameters
- Returns:
Processed batch
- abstract should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool [source]#
Check if batch should be kept.
- Parameters:
batch β Dataset batch to check
**kwargs β Additional parameters
- Returns:
True if batch should be kept, False otherwise
- class fastvideo.v1.dataset.preprocessing_datasets.DatasetStage[source]#
Bases:
abc.ABC
Abstract base class for dataset processing stages.
Similar to PipelineStage but designed for dataset preprocessing operations.
- abstract process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Process the dataset batch.
- Parameters:
batch β Dataset batch to process
**kwargs β Additional processing parameters
- Returns:
Processed batch
- class fastvideo.v1.dataset.preprocessing_datasets.FrameSamplingStage(num_frames: int, train_fps: int, speed_factor: int = 1, video_length_tolerance_range: float = 5.0, drop_short_ratio: float = 0.0, seed: int = 42)[source]#
Bases:
fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage
Stage for temporal frame sampling and indexing.
Initialization
- process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, temporal_sample_fn=None, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Process frame sampling for video data items.
- Parameters:
batch β Dataset batch
temporal_sample_fn β Function for temporal sampling
- Returns:
Updated batch with frame sampling info
- should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool [source]#
Check if video should be kept based on length constraints.
- Parameters:
batch β Dataset batch
- Returns:
True if should be kept, False otherwise
- class fastvideo.v1.dataset.preprocessing_datasets.ImageTransformStage(transform, transform_topcrop)[source]#
Bases:
fastvideo.v1.dataset.preprocessing_datasets.DatasetStage
Stage for image data transformation.
Initialization
- process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Transform image data.
- Parameters:
batch β Dataset batch with image information
- Returns:
Batch with transformed image tensor
- class fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#
Batch information for dataset processing stages.
This class holds all the information about a video-caption or image-caption pair as it moves through the processing pipeline. Fields are populated by different stages.
- cond_mask: torch.Tensor | None[source]#
None
- input_ids: torch.Tensor | None[source]#
None
- pixel_values: torch.Tensor | None[source]#
None
- class fastvideo.v1.dataset.preprocessing_datasets.ResolutionFilterStage(max_h_div_w_ratio: float = 17 / 16, min_h_div_w_ratio: float = 8 / 16, max_height: int = 1024, max_width: int = 1024)[source]#
Bases:
fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage
Stage for filtering data items based on resolution constraints.
Initialization
- filter_resolution(h: int, w: int, max_h_div_w_ratio: float, min_h_div_w_ratio: float) bool [source]#
Filter based on height/width ratio.
- process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Process does nothing for resolution filtering - filtering is handled by should_keep.
- should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool [source]#
Check if data item passes resolution filtering.
- Parameters:
batch β Dataset batch with resolution information
- Returns:
True if passes filter, False otherwise
- class fastvideo.v1.dataset.preprocessing_datasets.TextEncodingStage(tokenizer, text_max_length: int, cfg_rate: float = 0.0, seed: int = 42)[source]#
Bases:
fastvideo.v1.dataset.preprocessing_datasets.DatasetStage
Stage for text tokenization and encoding.
Initialization
- process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Process text data.
- Parameters:
batch β Dataset batch with caption information
- Returns:
Batch with encoded text information
- class fastvideo.v1.dataset.preprocessing_datasets.VideoCaptionMergedDataset(data_merge_path: str, args, transform, temporal_sample, transform_topcrop, start_idx: int = 0, seed: int = 42)[source]#
Bases:
torch.utils.data.IterableDataset
,torch.distributed.checkpoint.stateful.Stateful
Merged dataset for video and caption data with stage-based processing. Assumes that data_merge_path is a txt file with the following format: <folder_path>,<json_file_path>
The folder should contain videos. The json file should be a list of dictionaries with the following format: [ { "path": "1gGQy4nxyUo-Scene-016.mp4", "resolution": { "width": 1920, "height": 1080 }, "size": 2439112, "fps": 25.0, "duration": 6.88, "num_frames": 172, "cap": [ "A watermelon wearing a helmet is crushed by a hydraulic press, causing it to flatten and burst open." ] }, ... ]
This dataset processes video and image data through a series of stages:
Data validation
Resolution filtering
Frame sampling
Transformation
Text encoding
Initialization
- class fastvideo.v1.dataset.preprocessing_datasets.VideoTransformStage(transform)[source]#
Bases:
fastvideo.v1.dataset.preprocessing_datasets.DatasetStage
Stage for video data transformation.
Initialization
- process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch [source]#
Transform video data.
- Parameters:
batch β Dataset batch with video information
- Returns:
Batch with transformed video tensor