fastvideo.v1.dataset.preprocessing_datasets

`fastvideo.v1.dataset.preprocessing_datasets`#

Module Contents#

Classes#

`DataValidationStage`	Stage for validating data items.
`DatasetFilterStage`	Abstract base class for dataset filtering stages.
`DatasetStage`	Abstract base class for dataset processing stages.
`FrameSamplingStage`	Stage for temporal frame sampling and indexing.
`ImageTransformStage`	Stage for image data transformation.
`PreprocessBatch`	Batch information for dataset processing stages.
`ResolutionFilterStage`	Stage for filtering data items based on resolution constraints.
`TextEncodingStage`	Stage for text tokenization and encoding.
`VideoCaptionMergedDataset`	Merged dataset for video and caption data with stage-based processing. Assumes that data_merge_path is a txt file with the following format: <folder_path>,<json_file_path>
`VideoTransformStage`	Stage for video data transformation.

Data#

logger

API#

class fastvideo.v1.dataset.preprocessing_datasets.DataValidationStage[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage

Stage for validating data items.

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#: Process does nothing for validation - filtering is handled by should_keep.

should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → bool[source]#

Validate data item.

Parameters:: batch – Dataset batch to validate
Returns:: True if valid, False if invalid

class fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage[source]#

Bases: abc.ABC

Abstract base class for dataset filtering stages.

These stages can filter out items during metadata processing.

abstract process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process the dataset batch (for non-filtering operations).

Parameters:

batch – Dataset batch to process
**kwargs – Additional processing parameters

Returns:

Processed batch

abstract should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → bool[source]#

Check if batch should be kept.

Parameters:

batch – Dataset batch to check
**kwargs – Additional parameters

Returns:

True if batch should be kept, False otherwise

class fastvideo.v1.dataset.preprocessing_datasets.DatasetStage[source]#

Bases: abc.ABC

Abstract base class for dataset processing stages.

Similar to PipelineStage but designed for dataset preprocessing operations.

abstract process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process the dataset batch.

Parameters:

batch – Dataset batch to process
**kwargs – Additional processing parameters

Returns:

Processed batch

class fastvideo.v1.dataset.preprocessing_datasets.FrameSamplingStage(num_frames: int, train_fps: int, speed_factor: int = 1, video_length_tolerance_range: float = 5.0, drop_short_ratio: float = 0.0, seed: int = 42)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage

Stage for temporal frame sampling and indexing.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, temporal_sample_fn=None, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process frame sampling for video data items.

Parameters:

batch – Dataset batch
temporal_sample_fn – Function for temporal sampling

Returns:

Updated batch with frame sampling info

should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → bool[source]#

Check if video should be kept based on length constraints.

Parameters:: batch – Dataset batch
Returns:: True if should be kept, False otherwise

class fastvideo.v1.dataset.preprocessing_datasets.ImageTransformStage(transform, transform_topcrop)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetStage

Stage for image data transformation.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Transform image data.

Parameters:: batch – Dataset batch with image information
Returns:: Batch with transformed image tensor

class fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Batch information for dataset processing stages.

This class holds all the information about a video-caption or image-caption pair as it moves through the processing pipeline. Fields are populated by different stages.

cap: str | list[str][source]#: None

cond_mask: torch.Tensor | None[source]#: None

duration: float | None[source]#: None

fps: float | None[source]#: None

input_ids: torch.Tensor | None[source]#: None

property is_image: bool[source]#: Check if this is an image item.

property is_video: bool[source]#: Check if this is a video item.

num_frames: int | None[source]#: None

path: str[source]#: None

pixel_values: torch.Tensor | None[source]#: None

resolution: dict | None[source]#: None

sample_frame_index: list[int] | None[source]#: None

sample_num_frames: int | None[source]#: None

text: str | None[source]#: None

class fastvideo.v1.dataset.preprocessing_datasets.ResolutionFilterStage(max_h_div_w_ratio: float = 17 / 16, min_h_div_w_ratio: float = 8 / 16, max_height: int = 1024, max_width: int = 1024)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage

Stage for filtering data items based on resolution constraints.

Initialization

filter_resolution(h: int, w: int, max_h_div_w_ratio: float, min_h_div_w_ratio: float) → bool[source]#: Filter based on height/width ratio.

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#: Process does nothing for resolution filtering - filtering is handled by should_keep.

should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → bool[source]#

Check if data item passes resolution filtering.

Parameters:: batch – Dataset batch with resolution information
Returns:: True if passes filter, False otherwise

class fastvideo.v1.dataset.preprocessing_datasets.TextEncodingStage(tokenizer, text_max_length: int, cfg_rate: float = 0.0, seed: int = 42)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetStage

Stage for text tokenization and encoding.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process text data.

Parameters:: batch – Dataset batch with caption information
Returns:: Batch with encoded text information

class fastvideo.v1.dataset.preprocessing_datasets.VideoCaptionMergedDataset(data_merge_path: str, args, transform, temporal_sample, transform_topcrop, start_idx: int = 0, seed: int = 42)[source]#

Bases: torch.utils.data.IterableDataset, torch.distributed.checkpoint.stateful.Stateful

Merged dataset for video and caption data with stage-based processing. Assumes that data_merge_path is a txt file with the following format: <folder_path>,<json_file_path>

The folder should contain videos.

The json file should be a list of dictionaries with the following format:
[
{
    "path": "1gGQy4nxyUo-Scene-016.mp4",
    "resolution": {
    "width": 1920,
    "height": 1080
    },
    "size": 2439112,
    "fps": 25.0,
    "duration": 6.88,
    "num_frames": 172,
    "cap": [
    "A watermelon wearing a helmet is crushed by a hydraulic press, causing it to flatten and burst open."
    ]
},
...
]

This dataset processes video and image data through a series of stages:

Data validation
Resolution filtering
Frame sampling
Transformation
Text encoding

Initialization

load_state_dict(state_dict: dict[str, Any]) → None[source]#: Load state dict from checkpoint.

state_dict() → dict[str, Any][source]#: Return state dict for checkpointing.

class fastvideo.v1.dataset.preprocessing_datasets.VideoTransformStage(transform)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetStage

Stage for video data transformation.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) → fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Transform video data.

Parameters:: batch – Dataset batch with video information
Returns:: Batch with transformed video tensor

fastvideo.v1.dataset.preprocessing_datasets.logger[source]#: ‘init_logger(…)’

fastvideo.v1.dataset.preprocessing_datasets

Contents

fastvideo.v1.dataset.preprocessing_datasets#

Module Contents#

Classes#

Data#

API#

`fastvideo.v1.dataset.preprocessing_datasets`#