fastvideo.v1.dataset.parquet_dataset_iterable_style
#
Module Contents#
Classes#
Efficient loader for video-text data from a directory of Parquet files. |
Functions#
Build a dataloader for the LatentsParquetIterStyleDataset. |
|
Shard parquet files across SP groups and workers in a balanced way. |
Data#
API#
- class fastvideo.v1.dataset.parquet_dataset_iterable_style.BatchIterator(files, batch_size, text_padding_length, keys, worker_num_samples, read_batch_size)[source]#
Initialization
- class fastvideo.v1.dataset.parquet_dataset_iterable_style.LatentsParquetIterStyleDataset(path: str, batch_size: int = 1024, cfg_rate: float = 0.1, num_workers: int = 1, drop_last: bool = True, text_padding_length: int = 512, seed: int = 42, read_batch_size: int = 32, parquet_schema: pyarrow.Schema = None)[source]#
Bases:
torch.utils.data.IterableDataset
Efficient loader for video-text data from a directory of Parquet files.
Initialization
- fastvideo.v1.dataset.parquet_dataset_iterable_style.build_parquet_iterable_style_dataloader(path: str, batch_size: int, num_data_workers: int, cfg_rate: float = 0.0, drop_last: bool = True, text_padding_length: int = 512, seed: int = 42, read_batch_size: int = 32) tuple[fastvideo.v1.dataset.parquet_dataset_iterable_style.LatentsParquetIterStyleDataset, torchdata.stateful_dataloader.StatefulDataLoader] [source]#
Build a dataloader for the LatentsParquetIterStyleDataset.
- fastvideo.v1.dataset.parquet_dataset_iterable_style.shard_parquet_files_across_sp_groups_and_workers(path: str, num_sp_groups: int, num_workers: int, seed: int = 42) tuple[list[list[str]], list[int], list[dict[str, int]]] [source]#
Shard parquet files across SP groups and workers in a balanced way.
- Parameters:
path – Directory containing parquet files
num_sp_groups – Number of SP groups to shard across
num_workers – Number of workers per SP group
seed – Random seed for shuffling
- Returns:
List of lists of parquet files for each shard
List of total samples per shard
List of dictionaries mapping file paths to their lengths
- Return type:
Tuple containing