Skip to content

turbodiffusion

TurboDiffusion pipeline configurations.

TurboDiffusion uses RCM (recurrent Consistency Model) scheduler with SLA (Sparse-Linear Attention) for fast 1-4 step video generation.

Classes

fastvideo.configs.pipelines.turbodiffusion.TurboDiffusionI2VConfig dataclass

TurboDiffusionI2VConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 5.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = CLIPVisionConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = 0.9, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: PipelineConfig

Base configuration for TurboDiffusion I2V pipeline.

Uses RCM scheduler with sigma_max=200 for 1-4 step generation. Uses boundary_ratio=0.9 for high-noise to low-noise model switching.

fastvideo.configs.pipelines.turbodiffusion.TurboDiffusionI2V_A14B_Config dataclass

TurboDiffusionI2V_A14B_Config(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 5.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = CLIPVisionConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = 0.9, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: TurboDiffusionI2VConfig

Configuration for TurboDiffusion I2V A14B model.

fastvideo.configs.pipelines.turbodiffusion.TurboDiffusionT2VConfig dataclass

TurboDiffusionT2VConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 3.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: PipelineConfig

Base configuration for TurboDiffusion T2V pipeline.

Uses RCM scheduler with sigma_max=80 for 1-4 step generation. No boundary_ratio (single model, no switching).

fastvideo.configs.pipelines.turbodiffusion.TurboDiffusionT2V_14B_Config dataclass

TurboDiffusionT2V_14B_Config(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 5.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: TurboDiffusionT2VConfig

Configuration for TurboDiffusion T2V 14B model.

Uses same config as 1.3B but with higher flow_shift for 14B model.

fastvideo.configs.pipelines.turbodiffusion.TurboDiffusionT2V_1_3B_Config dataclass

TurboDiffusionT2V_1_3B_Config(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 3.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: TurboDiffusionT2VConfig

Configuration for TurboDiffusion T2V 1.3B model.