fastvideo.v1.dataset.parquet_dataset_map_style#

Module Contents#

Classes#

DP_SP_BatchSampler

A simple sequential batch sampler that yields batches of indices.

LatentsParquetMapStyleDataset

Return latents[B,C,T,H,W] and embeddings[B,L,D] in pinned CPU memory. Note: Using parquet for map style dataset is not efficient, we mainly keep it for backward compatibility and debugging.

Functions#

Data#

API#

class fastvideo.v1.dataset.parquet_dataset_map_style.DP_SP_BatchSampler(batch_size: int, dataset_size: int, num_sp_groups: int, sp_world_size: int, global_rank: int, drop_last: bool = True, drop_first_row: bool = False, seed: int = 0)[source]#

Bases: torch.utils.data.Sampler[typing.List[int]]

A simple sequential batch sampler that yields batches of indices.

Initialization

class fastvideo.v1.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset(path: str, batch_size: int, cfg_rate: float = 0.0, seed: int = 42, drop_last: bool = True, drop_first_row: bool = False, text_padding_length: int = 512)[source]#

Bases: torch.utils.data.Dataset

Return latents[B,C,T,H,W] and embeddings[B,L,D] in pinned CPU memory. Note: Using parquet for map style dataset is not efficient, we mainly keep it for backward compatibility and debugging.

Initialization

get_validation_negative_prompt() tuple[torch.Tensor, torch.Tensor, torch.Tensor, str][source]#

Get the negative prompt for validation. This method ensures the negative prompt is loaded and cached properly. Returns the processed negative prompt data (latents, embeddings, masks, info).

keys[source]#

[(‘vae_latent’, ‘latent’), ‘text_embedding’]

fastvideo.v1.dataset.parquet_dataset_map_style.build_parquet_map_style_dataloader(path, batch_size, num_data_workers, cfg_rate=0.0, drop_last=True, drop_first_row=False, text_padding_length=512, seed=42) Tuple[fastvideo.v1.dataset.parquet_dataset_map_style.LatentsParquetMapStyleDataset, torchdata.stateful_dataloader.StatefulDataLoader][source]#
fastvideo.v1.dataset.parquet_dataset_map_style.get_parquet_files_and_length(path: str)[source]#
fastvideo.v1.dataset.parquet_dataset_map_style.logger[source]#

‘init_logger(…)’

fastvideo.v1.dataset.parquet_dataset_map_style.passthrough(batch)[source]#
fastvideo.v1.dataset.parquet_dataset_map_style.read_row_from_parquet_file(parquet_files: List[str], global_row_idx: int, lengths: List[int]) Dict[str, Any][source]#

Read a row from a parquet file.

Parameters:
  • parquet_files – List[str]

  • global_row_idx – int

  • lengths – List[int]

Returns: