fastvideo.v1.dataset.preprocessing_datasets#

Module Contents#

Classes#

DataValidationStage

Stage for validating data items.

DatasetFilterStage

Abstract base class for dataset filtering stages.

DatasetStage

Abstract base class for dataset processing stages.

FrameSamplingStage

Stage for temporal frame sampling and indexing.

ImageTransformStage

Stage for image data transformation.

PreprocessBatch

Batch information for dataset processing stages.

ResolutionFilterStage

Stage for filtering data items based on resolution constraints.

TextEncodingStage

Stage for text tokenization and encoding.

VideoCaptionMergedDataset

Merged dataset for video and caption data with stage-based processing. Assumes that data_merge_path is a txt file with the following format: <folder_path>,<json_file_path>

VideoTransformStage

Stage for video data transformation.

Data#

API#

class fastvideo.v1.dataset.preprocessing_datasets.DataValidationStage[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage

Stage for validating data items.

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process does nothing for validation - filtering is handled by should_keep.

should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool[source]#

Validate data item.

Parameters:

batch – Dataset batch to validate

Returns:

True if valid, False if invalid

class fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage[source]#

Bases: abc.ABC

Abstract base class for dataset filtering stages.

These stages can filter out items during metadata processing.

abstract process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process the dataset batch (for non-filtering operations).

Parameters:
  • batch – Dataset batch to process

  • **kwargs – Additional processing parameters

Returns:

Processed batch

abstract should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool[source]#

Check if batch should be kept.

Parameters:
  • batch – Dataset batch to check

  • **kwargs – Additional parameters

Returns:

True if batch should be kept, False otherwise

class fastvideo.v1.dataset.preprocessing_datasets.DatasetStage[source]#

Bases: abc.ABC

Abstract base class for dataset processing stages.

Similar to PipelineStage but designed for dataset preprocessing operations.

abstract process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process the dataset batch.

Parameters:
  • batch – Dataset batch to process

  • **kwargs – Additional processing parameters

Returns:

Processed batch

class fastvideo.v1.dataset.preprocessing_datasets.FrameSamplingStage(num_frames: int, train_fps: int, speed_factor: int = 1, video_length_tolerance_range: float = 5.0, drop_short_ratio: float = 0.0, seed: int = 42)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage

Stage for temporal frame sampling and indexing.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, temporal_sample_fn=None, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process frame sampling for video data items.

Parameters:
  • batch – Dataset batch

  • temporal_sample_fn – Function for temporal sampling

Returns:

Updated batch with frame sampling info

should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool[source]#

Check if video should be kept based on length constraints.

Parameters:

batch – Dataset batch

Returns:

True if should be kept, False otherwise

class fastvideo.v1.dataset.preprocessing_datasets.ImageTransformStage(transform, transform_topcrop)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetStage

Stage for image data transformation.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Transform image data.

Parameters:

batch – Dataset batch with image information

Returns:

Batch with transformed image tensor

class fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Batch information for dataset processing stages.

This class holds all the information about a video-caption or image-caption pair as it moves through the processing pipeline. Fields are populated by different stages.

cap: str | list[str][source]#

None

cond_mask: torch.Tensor | None[source]#

None

duration: float | None[source]#

None

fps: float | None[source]#

None

input_ids: torch.Tensor | None[source]#

None

property is_image: bool[source]#

Check if this is an image item.

property is_video: bool[source]#

Check if this is a video item.

num_frames: int | None[source]#

None

path: str[source]#

None

pixel_values: torch.Tensor | None[source]#

None

resolution: dict | None[source]#

None

sample_frame_index: list[int] | None[source]#

None

sample_num_frames: int | None[source]#

None

text: str | None[source]#

None

class fastvideo.v1.dataset.preprocessing_datasets.ResolutionFilterStage(max_h_div_w_ratio: float = 17 / 16, min_h_div_w_ratio: float = 8 / 16, max_height: int = 1024, max_width: int = 1024)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetFilterStage

Stage for filtering data items based on resolution constraints.

Initialization

filter_resolution(h: int, w: int, max_h_div_w_ratio: float, min_h_div_w_ratio: float) bool[source]#

Filter based on height/width ratio.

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process does nothing for resolution filtering - filtering is handled by should_keep.

should_keep(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) bool[source]#

Check if data item passes resolution filtering.

Parameters:

batch – Dataset batch with resolution information

Returns:

True if passes filter, False otherwise

class fastvideo.v1.dataset.preprocessing_datasets.TextEncodingStage(tokenizer, text_max_length: int, cfg_rate: float = 0.0, seed: int = 42)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetStage

Stage for text tokenization and encoding.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Process text data.

Parameters:

batch – Dataset batch with caption information

Returns:

Batch with encoded text information

class fastvideo.v1.dataset.preprocessing_datasets.VideoCaptionMergedDataset(data_merge_path: str, args, transform, temporal_sample, transform_topcrop, start_idx: int = 0, seed: int = 42)[source]#

Bases: torch.utils.data.IterableDataset, torch.distributed.checkpoint.stateful.Stateful

Merged dataset for video and caption data with stage-based processing. Assumes that data_merge_path is a txt file with the following format: <folder_path>,<json_file_path>

The folder should contain videos.

The json file should be a list of dictionaries with the following format:
[
{
    "path": "1gGQy4nxyUo-Scene-016.mp4",
    "resolution": {
    "width": 1920,
    "height": 1080
    },
    "size": 2439112,
    "fps": 25.0,
    "duration": 6.88,
    "num_frames": 172,
    "cap": [
    "A watermelon wearing a helmet is crushed by a hydraulic press, causing it to flatten and burst open."
    ]
},
...
]

This dataset processes video and image data through a series of stages:

  • Data validation

  • Resolution filtering

  • Frame sampling

  • Transformation

  • Text encoding

Initialization

load_state_dict(state_dict: dict[str, Any]) None[source]#

Load state dict from checkpoint.

state_dict() dict[str, Any][source]#

Return state dict for checkpointing.

class fastvideo.v1.dataset.preprocessing_datasets.VideoTransformStage(transform)[source]#

Bases: fastvideo.v1.dataset.preprocessing_datasets.DatasetStage

Stage for video data transformation.

Initialization

process(batch: fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch, **kwargs) fastvideo.v1.dataset.preprocessing_datasets.PreprocessBatch[source]#

Transform video data.

Parameters:

batch – Dataset batch with video information

Returns:

Batch with transformed video tensor

fastvideo.v1.dataset.preprocessing_datasets.logger[source]#

β€˜init_logger(…)’