dataset
¶
Classes¶
fastvideo.dataset.TextDataset
¶
Bases: IterableDataset, Stateful
Text-only dataset for processing prompts from a simple text file.
Assumes that data_merge_path is a text file with one prompt per line: A cat playing with a ball A dog running in the park A person cooking dinner ...
This dataset processes text data through text encoding stages only.
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.TextDataset.__iter__
¶
Iterator for the dataset.
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.TextDataset.load_state_dict
¶
fastvideo.dataset.TextDataset.state_dict
¶
fastvideo.dataset.ValidationDataset
¶
ValidationDataset(filename: str)
Bases: IterableDataset
Source code in fastvideo/dataset/validation_dataset.py
fastvideo.dataset.VideoCaptionMergedDataset
¶
VideoCaptionMergedDataset(data_merge_path: str, args, transform, temporal_sample, transform_topcrop, start_idx: int = 0, seed: int = 42)
Bases: IterableDataset, Stateful
Merged dataset for video and caption data with stage-based processing.
Assumes that data_merge_path is a txt file with the following format:
The folder should contain videos.
The json file should be a list of dictionaries with the following format:
[
{
"path": "1gGQy4nxyUo-Scene-016.mp4",
"resolution": {
"width": 1920,
"height": 1080
},
"size": 2439112,
"fps": 25.0,
"duration": 6.88,
"num_frames": 172,
"cap": [
"A watermelon wearing a helmet is crushed by a hydraulic press, causing it to flatten and burst open."
]
},
...
]
This dataset processes video and image data through a series of stages:
- Data validation
- Resolution filtering
- Frame sampling
- Transformation
- Text encoding
Source code in fastvideo/dataset/preprocessing_datasets.py
Modules¶
fastvideo.dataset.parquet_dataset_iterable_style
¶
Classes¶
fastvideo.dataset.parquet_dataset_iterable_style.LatentsParquetIterStyleDataset
¶
LatentsParquetIterStyleDataset(path: str, batch_size: int = 1024, cfg_rate: float = 0.1, num_workers: int = 1, drop_last: bool = True, text_padding_length: int = 512, seed: int = 42, read_batch_size: int = 32, parquet_schema: Schema = None)
Bases: IterableDataset
Efficient loader for video-text data from a directory of Parquet files.
Source code in fastvideo/dataset/parquet_dataset_iterable_style.py
Functions¶
fastvideo.dataset.parquet_dataset_iterable_style.build_parquet_iterable_style_dataloader
¶
build_parquet_iterable_style_dataloader(path: str, batch_size: int, num_data_workers: int, cfg_rate: float = 0.0, drop_last: bool = True, text_padding_length: int = 512, seed: int = 42, read_batch_size: int = 32) -> tuple[LatentsParquetIterStyleDataset, StatefulDataLoader]
Build a dataloader for the LatentsParquetIterStyleDataset.
Source code in fastvideo/dataset/parquet_dataset_iterable_style.py
fastvideo.dataset.parquet_dataset_iterable_style.shard_parquet_files_across_sp_groups_and_workers
¶
shard_parquet_files_across_sp_groups_and_workers(path: str, num_sp_groups: int, num_workers: int, seed: int = 42) -> tuple[list[list[str]], list[int], list[dict[str, int]]]
Shard parquet files across SP groups and workers in a balanced way.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Directory containing parquet files |
required |
num_sp_groups
|
int
|
Number of SP groups to shard across |
required |
num_workers
|
int
|
Number of workers per SP group |
required |
seed
|
int
|
Random seed for shuffling |
42
|
Returns:
| Type | Description |
|---|---|
list[list[str]]
|
Tuple containing: |
list[int]
|
|
list[dict[str, int]]
|
|
tuple[list[list[str]], list[int], list[dict[str, int]]]
|
|
Source code in fastvideo/dataset/parquet_dataset_iterable_style.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
fastvideo.dataset.parquet_dataset_map_style
¶
Classes¶
fastvideo.dataset.parquet_dataset_map_style.DP_SP_BatchSampler
¶
DP_SP_BatchSampler(batch_size: int, dataset_size: int, num_sp_groups: int, sp_world_size: int, global_rank: int, drop_last: bool = True, drop_first_row: bool = False, seed: int = 0)
A simple sequential batch sampler that yields batches of indices.
Source code in fastvideo/dataset/parquet_dataset_map_style.py
fastvideo.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset
¶
LatentsParquetMapStyleDataset(path: str, batch_size: int, parquet_schema: Schema, cfg_rate: float = 0.0, seed: int = 42, drop_last: bool = True, drop_first_row: bool = False, text_padding_length: int = 512)
Bases: Dataset
Return latents[B,C,T,H,W] and embeddings[B,L,D] in pinned CPU memory. Note: Using parquet for map style dataset is not efficient, we mainly keep it for backward compatibility and debugging.
Source code in fastvideo/dataset/parquet_dataset_map_style.py
Functions¶
fastvideo.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset.__getitems__
¶Batch fetch using read_row_from_parquet_file for each index.
Source code in fastvideo/dataset/parquet_dataset_map_style.py
fastvideo.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset.get_validation_negative_prompt
¶Get the negative prompt for validation. This method ensures the negative prompt is loaded and cached properly. Returns the processed negative prompt data (latents, embeddings, masks, info).
Source code in fastvideo/dataset/parquet_dataset_map_style.py
Functions¶
fastvideo.dataset.parquet_dataset_map_style.read_row_from_parquet_file
¶
read_row_from_parquet_file(parquet_files: list[str], global_row_idx: int, lengths: list[int]) -> dict[str, Any]
Read a row from a parquet file. Args: parquet_files: List[str] global_row_idx: int lengths: List[int] Returns:
Source code in fastvideo/dataset/parquet_dataset_map_style.py
fastvideo.dataset.preprocessing_datasets
¶
Classes¶
fastvideo.dataset.preprocessing_datasets.DataValidationStage
¶
Bases: DatasetFilterStage
Stage for validating data items.
Functions¶
fastvideo.dataset.preprocessing_datasets.DataValidationStage.process
¶process(batch: PreprocessBatch, **kwargs) -> PreprocessBatch
Process does nothing for validation - filtering is handled by should_keep.
fastvideo.dataset.preprocessing_datasets.DataValidationStage.should_keep
¶should_keep(batch: PreprocessBatch, **kwargs) -> bool
Validate data item.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch to validate |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if valid, False if invalid |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.DatasetFilterStage
¶
Bases: ABC
Abstract base class for dataset filtering stages.
These stages can filter out items during metadata processing.
Functions¶
fastvideo.dataset.preprocessing_datasets.DatasetFilterStage.process
abstractmethod
¶process(batch: PreprocessBatch, **kwargs) -> PreprocessBatch
Process the dataset batch (for non-filtering operations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch to process |
required |
**kwargs
|
Additional processing parameters |
{}
|
Returns:
| Type | Description |
|---|---|
PreprocessBatch
|
Processed batch |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.DatasetFilterStage.should_keep
abstractmethod
¶should_keep(batch: PreprocessBatch, **kwargs) -> bool
Check if batch should be kept.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch to check |
required |
**kwargs
|
Additional parameters |
{}
|
Returns:
| Type | Description |
|---|---|
bool
|
True if batch should be kept, False otherwise |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.DatasetStage
¶
Bases: ABC
Abstract base class for dataset processing stages.
Similar to PipelineStage but designed for dataset preprocessing operations.
Functions¶
fastvideo.dataset.preprocessing_datasets.DatasetStage.process
abstractmethod
¶process(batch: PreprocessBatch, **kwargs) -> PreprocessBatch
Process the dataset batch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch to process |
required |
**kwargs
|
Additional processing parameters |
{}
|
Returns:
| Type | Description |
|---|---|
PreprocessBatch
|
Processed batch |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.FrameSamplingStage
¶
FrameSamplingStage(num_frames: int, train_fps: int, speed_factor: int = 1, video_length_tolerance_range: float = 5.0, drop_short_ratio: float = 0.0, seed: int = 42)
Bases: DatasetFilterStage
Stage for temporal frame sampling and indexing.
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.preprocessing_datasets.FrameSamplingStage.process
¶process(batch: PreprocessBatch, temporal_sample_fn=None, **kwargs) -> PreprocessBatch
Process frame sampling for video data items.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch |
required |
temporal_sample_fn
|
Function for temporal sampling |
None
|
Returns:
| Type | Description |
|---|---|
PreprocessBatch
|
Updated batch with frame sampling info |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.FrameSamplingStage.should_keep
¶should_keep(batch: PreprocessBatch, **kwargs) -> bool
Check if video should be kept based on length constraints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if should be kept, False otherwise |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.ImageTransformStage
¶
Bases: DatasetStage
Stage for image data transformation.
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.preprocessing_datasets.ImageTransformStage.process
¶process(batch: PreprocessBatch, **kwargs) -> PreprocessBatch
Transform image data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch with image information |
required |
Returns:
| Type | Description |
|---|---|
PreprocessBatch
|
Batch with transformed image tensor |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.PreprocessBatch
dataclass
¶
PreprocessBatch(path: str, cap: str | list[str], resolution: dict | None = None, fps: float | None = None, duration: float | None = None, num_frames: int | None = None, sample_frame_index: list[int] | None = None, sample_num_frames: int | None = None, pixel_values: Tensor | None = None, text: str | None = None, input_ids: Tensor | None = None, cond_mask: Tensor | None = None)
Batch information for dataset processing stages.
This class holds all the information about a video-caption or image-caption pair as it moves through the processing pipeline. Fields are populated by different stages.
fastvideo.dataset.preprocessing_datasets.ResolutionFilterStage
¶
ResolutionFilterStage(max_h_div_w_ratio: float = 17 / 16, min_h_div_w_ratio: float = 8 / 16, max_height: int = 1024, max_width: int = 1024)
Bases: DatasetFilterStage
Stage for filtering data items based on resolution constraints.
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.preprocessing_datasets.ResolutionFilterStage.filter_resolution
¶Filter based on height/width ratio.
fastvideo.dataset.preprocessing_datasets.ResolutionFilterStage.process
¶process(batch: PreprocessBatch, **kwargs) -> PreprocessBatch
Process does nothing for resolution filtering - filtering is handled by should_keep.
fastvideo.dataset.preprocessing_datasets.ResolutionFilterStage.should_keep
¶should_keep(batch: PreprocessBatch, **kwargs) -> bool
Check if data item passes resolution filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch with resolution information |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if passes filter, False otherwise |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.TextDataset
¶
Bases: IterableDataset, Stateful
Text-only dataset for processing prompts from a simple text file.
Assumes that data_merge_path is a text file with one prompt per line: A cat playing with a ball A dog running in the park A person cooking dinner ...
This dataset processes text data through text encoding stages only.
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.preprocessing_datasets.TextDataset.__iter__
¶Iterator for the dataset.
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.TextDataset.load_state_dict
¶ fastvideo.dataset.preprocessing_datasets.TextDataset.state_dict
¶
fastvideo.dataset.preprocessing_datasets.TextEncodingStage
¶
Bases: DatasetStage
Stage for text tokenization and encoding.
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.preprocessing_datasets.TextEncodingStage.process
¶process(batch: PreprocessBatch, **kwargs) -> PreprocessBatch
Process text data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch with caption information |
required |
Returns:
| Type | Description |
|---|---|
PreprocessBatch
|
Batch with encoded text information |
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.VideoCaptionMergedDataset
¶
VideoCaptionMergedDataset(data_merge_path: str, args, transform, temporal_sample, transform_topcrop, start_idx: int = 0, seed: int = 42)
Bases: IterableDataset, Stateful
Merged dataset for video and caption data with stage-based processing.
Assumes that data_merge_path is a txt file with the following format:
The folder should contain videos.
The json file should be a list of dictionaries with the following format:
[
{
"path": "1gGQy4nxyUo-Scene-016.mp4",
"resolution": {
"width": 1920,
"height": 1080
},
"size": 2439112,
"fps": 25.0,
"duration": 6.88,
"num_frames": 172,
"cap": [
"A watermelon wearing a helmet is crushed by a hydraulic press, causing it to flatten and burst open."
]
},
...
]
This dataset processes video and image data through a series of stages:
- Data validation
- Resolution filtering
- Frame sampling
- Transformation
- Text encoding
Source code in fastvideo/dataset/preprocessing_datasets.py
fastvideo.dataset.preprocessing_datasets.VideoTransformStage
¶
Bases: DatasetStage
Stage for video data transformation.
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.preprocessing_datasets.VideoTransformStage.process
¶process(batch: PreprocessBatch, **kwargs) -> PreprocessBatch
Transform video data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
PreprocessBatch
|
Dataset batch with video information |
required |
Returns:
| Type | Description |
|---|---|
PreprocessBatch
|
Batch with transformed video tensor |
Source code in fastvideo/dataset/preprocessing_datasets.py
Functions¶
fastvideo.dataset.transform
¶
Classes¶
fastvideo.dataset.transform.CenterCropResizeVideo
¶
First use the short side for cropping length, center crop video, then resize to the specified size
Source code in fastvideo/dataset/transform.py
Functions¶
fastvideo.dataset.transform.CenterCropResizeVideo.__call__
¶Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
clip
|
tensor
|
Video clip to be cropped. Size is (T, C, H, W) |
required |
Returns: torch.tensor: scale resized / center cropped video clip. size is (T, C, crop_size, crop_size)
Source code in fastvideo/dataset/transform.py
fastvideo.dataset.transform.Normalize255
¶
Convert tensor data type from uint8 to float, divide value by 255.0 and
fastvideo.dataset.transform.TemporalRandomCrop
¶
Functions¶
fastvideo.dataset.transform.crop
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
clip
|
tensor
|
Video clip to be cropped. Size is (T, C, H, W) |
required |
Source code in fastvideo/dataset/transform.py
fastvideo.dataset.transform.normalize_video
¶
Convert tensor data type from uint8 to float, divide value by 255.0 and permute the dimensions of clip tensor Args: clip (torch.tensor, dtype=torch.uint8): Size is (T, C, H, W) Return: clip (torch.tensor, dtype=torch.float): Size is (T, C, H, W)
Source code in fastvideo/dataset/transform.py
fastvideo.dataset.utils
¶
Functions¶
fastvideo.dataset.utils.collate_rows_from_parquet_schema
¶
collate_rows_from_parquet_schema(rows, parquet_schema, text_padding_length, cfg_rate=0.0, rng=None) -> dict[str, Any]
Collate rows from parquet files based on the provided schema. Dynamically processes tensor fields based on schema and returns batched data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rows
|
List of row dictionaries from parquet files |
required | |
parquet_schema
|
PyArrow schema defining the structure of the data |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict containing batched tensors and metadata |
Source code in fastvideo/dataset/utils.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | |
fastvideo.dataset.utils.get_torch_tensors_from_row_dict
¶
Get the latents and prompts from a row dictionary.
Source code in fastvideo/dataset/utils.py
fastvideo.dataset.utils.pad
¶
Pad or crop an embedding [L, D] to exactly padding_length tokens. Return: - [L, D] tensor in pinned CPU memory - [L] attention mask in pinned CPU memory
Source code in fastvideo/dataset/utils.py
fastvideo.dataset.validation_dataset
¶
Classes¶
fastvideo.dataset.validation_dataset.ValidationDataset
¶
ValidationDataset(filename: str)
Bases: IterableDataset