parquet_dataset_map_style
¶
Classes¶
fastvideo.dataset.parquet_dataset_map_style.DP_SP_BatchSampler
¶
DP_SP_BatchSampler(batch_size: int, dataset_size: int, num_sp_groups: int, sp_world_size: int, global_rank: int, drop_last: bool = True, drop_first_row: bool = False, seed: int = 0)
A simple sequential batch sampler that yields batches of indices.
Source code in fastvideo/dataset/parquet_dataset_map_style.py
fastvideo.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset
¶
LatentsParquetMapStyleDataset(path: str, batch_size: int, parquet_schema: Schema, cfg_rate: float = 0.0, seed: int = 42, drop_last: bool = True, drop_first_row: bool = False, text_padding_length: int = 512)
Bases: Dataset
Return latents[B,C,T,H,W] and embeddings[B,L,D] in pinned CPU memory. Note: Using parquet for map style dataset is not efficient, we mainly keep it for backward compatibility and debugging.
Source code in fastvideo/dataset/parquet_dataset_map_style.py
Functions¶
fastvideo.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset.__getitems__
¶
Batch fetch using read_row_from_parquet_file for each index.
Source code in fastvideo/dataset/parquet_dataset_map_style.py
fastvideo.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset.get_validation_negative_prompt
¶
Get the negative prompt for validation. This method ensures the negative prompt is loaded and cached properly. Returns the processed negative prompt data (latents, embeddings, masks, info).
Source code in fastvideo/dataset/parquet_dataset_map_style.py
Functions¶
fastvideo.dataset.parquet_dataset_map_style.read_row_from_parquet_file
¶
read_row_from_parquet_file(parquet_files: list[str], global_row_idx: int, lengths: list[int]) -> dict[str, Any]
Read a row from a parquet file. Args: parquet_files: List[str] global_row_idx: int lengths: List[int] Returns: