fastvideo.v1.dataset.parquet_dataset_map_style

`fastvideo.v1.dataset.parquet_dataset_map_style`#

Module Contents#

Classes#

`DP_SP_BatchSampler`	A simple sequential batch sampler that yields batches of indices.
`LatentsParquetMapStyleDataset`	Return latents[B,C,T,H,W] and embeddings[B,L,D] in pinned CPU memory. Note: Using parquet for map style dataset is not efficient, we mainly keep it for backward compatibility and debugging.

Functions#

`build_parquet_map_style_dataloader`
`get_parquet_files_and_length`
`passthrough`
`read_row_from_parquet_file`	Read a row from a parquet file.

Data#

logger

API#

class fastvideo.v1.dataset.parquet_dataset_map_style.DP_SP_BatchSampler(batch_size: int, dataset_size: int, num_sp_groups: int, sp_world_size: int, global_rank: int, drop_last: bool = True, drop_first_row: bool = False, seed: int = 0)[source]#

Bases: torch.utils.data.Sampler[list[int]]

A simple sequential batch sampler that yields batches of indices.

Initialization

class fastvideo.v1.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset(path: str, batch_size: int, parquet_schema: pyarrow.Schema, cfg_rate: float = 0.0, seed: int = 42, drop_last: bool = True, drop_first_row: bool = False, text_padding_length: int = 512)[source]#

Bases: torch.utils.data.Dataset

Return latents[B,C,T,H,W] and embeddings[B,L,D] in pinned CPU memory. Note: Using parquet for map style dataset is not efficient, we mainly keep it for backward compatibility and debugging.

Initialization

get_validation_negative_prompt() → tuple[torch.Tensor, torch.Tensor, str][source]#: Get the negative prompt for validation. This method ensures the negative prompt is loaded and cached properly. Returns the processed negative prompt data (latents, embeddings, masks, info).

fastvideo.v1.dataset.parquet_dataset_map_style.build_parquet_map_style_dataloader(path, batch_size, num_data_workers, parquet_schema, cfg_rate=0.0, drop_last=True, drop_first_row=False, text_padding_length=512, seed=42) → tuple[fastvideo.v1.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset, torchdata.stateful_dataloader.StatefulDataLoader][source]#

fastvideo.v1.dataset.parquet_dataset_map_style.get_parquet_files_and_length(path: str)[source]#

fastvideo.v1.dataset.parquet_dataset_map_style.logger[source]#: ‘init_logger(…)’

fastvideo.v1.dataset.parquet_dataset_map_style.passthrough(batch)[source]#

fastvideo.v1.dataset.parquet_dataset_map_style.read_row_from_parquet_file(parquet_files: list[str], global_row_idx: int, lengths: list[int]) → dict[str, Any][source]#

Read a row from a parquet file.

Parameters:

parquet_files – List[str]
global_row_idx – int
lengths – List[int]

Returns: