Skip to content

pipelines

Classes

fastvideo.configs.pipelines.CosmosConfig dataclass

CosmosConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: int = 6, flow_shift: float = 1.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = CosmosVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = CosmosVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5LargeConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_large_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, conditioning_strategy: str = 'frame_replace', min_num_conditional_frames: int = 1, max_num_conditional_frames: int = 2, sigma_conditional: float = 0.0001, sigma_data: float = 1.0, state_ch: int = 16, state_t: int = 24, text_encoder_class: str = 'T5')

Bases: PipelineConfig

Configuration for Cosmos2 Video2World pipeline matching diffusers.

fastvideo.configs.pipelines.FastHunyuanConfig dataclass

FastHunyuanConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: int = 6, flow_shift: int = 17, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = HunyuanVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (LlamaConfig(), CLIPTextConfig()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp16', 'fp16'))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (llama_preprocess_text, clip_preprocess_text))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (llama_postprocess_text, clip_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None)

Bases: HunyuanConfig

Configuration specifically optimized for FastHunyuan weights.

fastvideo.configs.pipelines.Hunyuan15T2V480PConfig dataclass

Hunyuan15T2V480PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: int = 5, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideo15Config(), dit_precision: str = 'bf16', vae_config: VAEConfig = Hunyuan15VAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (Qwen2_5_VLConfig(), T5Config()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16', 'fp32'))(), preprocess_text_funcs: tuple[Callable[[Any], Any], ...] = (lambda: (qwen_preprocess_text, byt5_preprocess_text))(), postprocess_text_funcs: tuple[Callable[..., Any], ...] = (lambda: (qwen_postprocess_text, byt5_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, text_encoder_crop_start: int = PROMPT_TEMPLATE_TOKEN_LENGTH, text_encoder_max_lengths: tuple[int, ...] = (lambda: (1000 + PROMPT_TEMPLATE_TOKEN_LENGTH, 256))())

Bases: PipelineConfig

Base configuration for HunYuan pipeline architecture.

fastvideo.configs.pipelines.Hunyuan15T2V720PConfig dataclass

Hunyuan15T2V720PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: int = 9, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideo15Config(), dit_precision: str = 'bf16', vae_config: VAEConfig = Hunyuan15VAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (Qwen2_5_VLConfig(), T5Config()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16', 'fp32'))(), preprocess_text_funcs: tuple[Callable[[Any], Any], ...] = (lambda: (qwen_preprocess_text, byt5_preprocess_text))(), postprocess_text_funcs: tuple[Callable[..., Any], ...] = (lambda: (qwen_postprocess_text, byt5_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, text_encoder_crop_start: int = PROMPT_TEMPLATE_TOKEN_LENGTH, text_encoder_max_lengths: tuple[int, ...] = (lambda: (1000 + PROMPT_TEMPLATE_TOKEN_LENGTH, 256))())

Bases: Hunyuan15T2V480PConfig

Base configuration for HunYuan pipeline architecture.

fastvideo.configs.pipelines.HunyuanConfig dataclass

HunyuanConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: int = 6, flow_shift: int = 7, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = HunyuanVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (LlamaConfig(), CLIPTextConfig()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp16', 'fp16'))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (llama_preprocess_text, clip_preprocess_text))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (llama_postprocess_text, clip_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None)

Bases: PipelineConfig

Base configuration for HunYuan pipeline architecture.

fastvideo.configs.pipelines.PipelineConfig dataclass

PipelineConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = DiTConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = VAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (EncoderConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None)

Base configuration for all pipeline architectures.

Functions

fastvideo.configs.pipelines.PipelineConfig.from_kwargs classmethod
from_kwargs(kwargs: dict[str, Any], config_cli_prefix: str = '') -> PipelineConfig

Load PipelineConfig from kwargs Dictionary. kwargs: dictionary of kwargs config_cli_prefix: prefix of CLI arguments for this PipelineConfig instance

Source code in fastvideo/configs/pipelines/base.py
@classmethod
def from_kwargs(cls,
                kwargs: dict[str, Any],
                config_cli_prefix: str = "") -> "PipelineConfig":
    """
    Load PipelineConfig from kwargs Dictionary.
    kwargs: dictionary of kwargs
    config_cli_prefix: prefix of CLI arguments for this PipelineConfig instance
    """
    from fastvideo.configs.pipelines.registry import (
        get_pipeline_config_cls_from_name)

    prefix_with_dot = f"{config_cli_prefix}." if (config_cli_prefix.strip()
                                                  != "") else ""
    model_path: str | None = kwargs.get(prefix_with_dot + 'model_path',
                                        None) or kwargs.get('model_path')
    pipeline_config_or_path: str | PipelineConfig | dict[
        str, Any] | None = kwargs.get(prefix_with_dot + 'pipeline_config',
                                      None) or kwargs.get('pipeline_config')
    if model_path is None:
        raise ValueError("model_path is required in kwargs")

    # 1. Get the pipeline config class from the registry
    pipeline_config_cls = get_pipeline_config_cls_from_name(model_path)

    # 2. Instantiate PipelineConfig
    if pipeline_config_cls is None:
        logger.warning(
            "Couldn't find pipeline config for %s. Using the default pipeline config.",
            model_path)
        pipeline_config = cls()
    else:
        pipeline_config = pipeline_config_cls()

    # 3. Load PipelineConfig from a json file or a PipelineConfig object if provided
    if isinstance(pipeline_config_or_path, str):
        pipeline_config.load_from_json(pipeline_config_or_path)
        kwargs[prefix_with_dot +
               'pipeline_config_path'] = pipeline_config_or_path
    elif isinstance(pipeline_config_or_path, PipelineConfig):
        pipeline_config = pipeline_config_or_path
    elif isinstance(pipeline_config_or_path, dict):
        pipeline_config.update_pipeline_config(pipeline_config_or_path)

    # 4. Update PipelineConfig from CLI arguments if provided
    kwargs[prefix_with_dot + 'model_path'] = model_path
    pipeline_config.update_config_from_dict(kwargs, config_cli_prefix)
    return pipeline_config
fastvideo.configs.pipelines.PipelineConfig.from_pretrained classmethod
from_pretrained(model_path: str) -> PipelineConfig

use the pipeline class setting from model_path to match the pipeline config

Source code in fastvideo/configs/pipelines/base.py
@classmethod
def from_pretrained(cls, model_path: str) -> "PipelineConfig":
    """
    use the pipeline class setting from model_path to match the pipeline config
    """
    from fastvideo.configs.pipelines.registry import (
        get_pipeline_config_cls_from_name)
    pipeline_config_cls = get_pipeline_config_cls_from_name(model_path)

    return cast(PipelineConfig, pipeline_config_cls(model_path=model_path))

fastvideo.configs.pipelines.SlidingTileAttnConfig dataclass

SlidingTileAttnConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = DiTConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = VAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (EncoderConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, window_size: int = 16, stride: int = 8, height: int = 576, width: int = 1024, pad_to_square: bool = False, use_overlap_optimization: bool = True)

Bases: PipelineConfig

Configuration for sliding tile attention.

fastvideo.configs.pipelines.StepVideoT2VConfig dataclass

StepVideoT2VConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: int = 13, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = StepVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = StepVideoVAEConfig(), vae_precision: str = 'bf16', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (EncoderConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (postprocess_text,))(), pos_magic: str = '超高清、HDR 视频、环境光、杜比全景声、画面稳定、流畅动作、逼真的细节、专业级构图、超现实主义、自然、生动、超细节、清晰。', neg_magic: str = '画面暗、低分辨率、不良手、文本、缺少手指、多余的手指、裁剪、低质量、颗粒状、签名、水印、用户名、模糊。', timesteps_scale: bool = False, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16')

Bases: PipelineConfig

Base configuration for StepVideo pipeline architecture.

fastvideo.configs.pipelines.WanI2V480PConfig dataclass

WanI2V480PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 3.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = CLIPVisionConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanT2V480PConfig

Base configuration for Wan I2V 14B 480P pipeline architecture.

fastvideo.configs.pipelines.WanI2V720PConfig dataclass

WanI2V720PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 5.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = CLIPVisionConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanI2V480PConfig

Base configuration for Wan I2V 14B 720P pipeline architecture.

fastvideo.configs.pipelines.WanT2V480PConfig dataclass

WanT2V480PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 3.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: PipelineConfig

Base configuration for Wan T2V 1.3B pipeline architecture.

fastvideo.configs.pipelines.WanT2V720PConfig dataclass

WanT2V720PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 5.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanT2V480PConfig

Base configuration for Wan T2V 14B 720P pipeline architecture.

Functions

fastvideo.configs.pipelines.get_pipeline_config_cls_from_name

get_pipeline_config_cls_from_name(pipeline_name_or_path: str) -> type[PipelineConfig]

Get the appropriate configuration class for a given pipeline name or path.

This function implements a multi-step lookup process to find the most suitable configuration class for a given pipeline. It follows this order: 1. Exact match in the PIPE_NAME_TO_CONFIG 2. Partial match in the PIPE_NAME_TO_CONFIG 3. Fallback to class name in the model_index.json 4. else raise an error

Parameters:

Name Type Description Default
pipeline_name_or_path str

The name or path of the pipeline. This can be: - A registered model ID (e.g., "FastVideo/FastHunyuan-diffusers") - A local path to a model directory - A model ID that will be downloaded

required

Returns:

Type Description
type[PipelineConfig]

Type[PipelineConfig]: The configuration class that best matches the pipeline. This will be one of: - A specific weight configuration class if an exact match is found - A fallback configuration class based on the pipeline architecture - The base PipelineConfig class if no matches are found

Note
  • For local paths, the function will verify the model configuration
  • For remote models, it will attempt to download the model index
  • Warning messages are logged when falling back to less specific configurations
Source code in fastvideo/configs/pipelines/registry.py
def get_pipeline_config_cls_from_name(
        pipeline_name_or_path: str) -> type[PipelineConfig]:
    """Get the appropriate configuration class for a given pipeline name or path.

    This function implements a multi-step lookup process to find the most suitable
    configuration class for a given pipeline. It follows this order:
    1. Exact match in the PIPE_NAME_TO_CONFIG
    2. Partial match in the PIPE_NAME_TO_CONFIG
    3. Fallback to class name in the model_index.json
    4. else raise an error

    Args:
        pipeline_name_or_path (str): The name or path of the pipeline. This can be:
            - A registered model ID (e.g., "FastVideo/FastHunyuan-diffusers")
            - A local path to a model directory
            - A model ID that will be downloaded

    Returns:
        Type[PipelineConfig]: The configuration class that best matches the pipeline.
            This will be one of:
            - A specific weight configuration class if an exact match is found
            - A fallback configuration class based on the pipeline architecture
            - The base PipelineConfig class if no matches are found

    Note:
        - For local paths, the function will verify the model configuration
        - For remote models, it will attempt to download the model index
        - Warning messages are logged when falling back to less specific configurations
    """

    pipeline_config_cls: type[PipelineConfig] | None = None

    # First try exact match for specific weights
    if pipeline_name_or_path in PIPE_NAME_TO_CONFIG:
        pipeline_config_cls = PIPE_NAME_TO_CONFIG[pipeline_name_or_path]
        return pipeline_config_cls

    # Try partial matches (for local paths that might include the weight ID)
    for registered_id, config_class in PIPE_NAME_TO_CONFIG.items():
        if registered_id in pipeline_name_or_path:
            pipeline_config_cls = config_class
            break

    # If no match, try to use the fallback config
    if pipeline_config_cls is None:
        if os.path.exists(pipeline_name_or_path):
            config = verify_model_config_and_directory(pipeline_name_or_path)
        else:
            config = maybe_download_model_index(pipeline_name_or_path)
        logger.warning(
            "Trying to use the config from the model_index.json. FastVideo may not correctly identify the optimal config for this model in this situation."
        )

        pipeline_name = config["_class_name"]
        # Try to determine pipeline architecture for fallback
        for pipeline_type, detector in PIPELINE_DETECTOR.items():
            if detector(pipeline_name.lower()):
                pipeline_config_cls = PIPELINE_FALLBACK_CONFIG.get(
                    pipeline_type)
                break

        if pipeline_config_cls is not None:
            logger.warning(
                "No match found for pipeline %s, using fallback config %s.",
                pipeline_name_or_path, pipeline_config_cls)

    if pipeline_config_cls is None:
        raise ValueError(
            f"No match found for pipeline {pipeline_name_or_path}, please check the pipeline name or path."
        )

    return pipeline_config_cls

Modules

fastvideo.configs.pipelines.base

Classes

fastvideo.configs.pipelines.base.PipelineConfig dataclass
PipelineConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = DiTConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = VAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (EncoderConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None)

Base configuration for all pipeline architectures.

Functions
fastvideo.configs.pipelines.base.PipelineConfig.from_kwargs classmethod
from_kwargs(kwargs: dict[str, Any], config_cli_prefix: str = '') -> PipelineConfig

Load PipelineConfig from kwargs Dictionary. kwargs: dictionary of kwargs config_cli_prefix: prefix of CLI arguments for this PipelineConfig instance

Source code in fastvideo/configs/pipelines/base.py
@classmethod
def from_kwargs(cls,
                kwargs: dict[str, Any],
                config_cli_prefix: str = "") -> "PipelineConfig":
    """
    Load PipelineConfig from kwargs Dictionary.
    kwargs: dictionary of kwargs
    config_cli_prefix: prefix of CLI arguments for this PipelineConfig instance
    """
    from fastvideo.configs.pipelines.registry import (
        get_pipeline_config_cls_from_name)

    prefix_with_dot = f"{config_cli_prefix}." if (config_cli_prefix.strip()
                                                  != "") else ""
    model_path: str | None = kwargs.get(prefix_with_dot + 'model_path',
                                        None) or kwargs.get('model_path')
    pipeline_config_or_path: str | PipelineConfig | dict[
        str, Any] | None = kwargs.get(prefix_with_dot + 'pipeline_config',
                                      None) or kwargs.get('pipeline_config')
    if model_path is None:
        raise ValueError("model_path is required in kwargs")

    # 1. Get the pipeline config class from the registry
    pipeline_config_cls = get_pipeline_config_cls_from_name(model_path)

    # 2. Instantiate PipelineConfig
    if pipeline_config_cls is None:
        logger.warning(
            "Couldn't find pipeline config for %s. Using the default pipeline config.",
            model_path)
        pipeline_config = cls()
    else:
        pipeline_config = pipeline_config_cls()

    # 3. Load PipelineConfig from a json file or a PipelineConfig object if provided
    if isinstance(pipeline_config_or_path, str):
        pipeline_config.load_from_json(pipeline_config_or_path)
        kwargs[prefix_with_dot +
               'pipeline_config_path'] = pipeline_config_or_path
    elif isinstance(pipeline_config_or_path, PipelineConfig):
        pipeline_config = pipeline_config_or_path
    elif isinstance(pipeline_config_or_path, dict):
        pipeline_config.update_pipeline_config(pipeline_config_or_path)

    # 4. Update PipelineConfig from CLI arguments if provided
    kwargs[prefix_with_dot + 'model_path'] = model_path
    pipeline_config.update_config_from_dict(kwargs, config_cli_prefix)
    return pipeline_config
fastvideo.configs.pipelines.base.PipelineConfig.from_pretrained classmethod
from_pretrained(model_path: str) -> PipelineConfig

use the pipeline class setting from model_path to match the pipeline config

Source code in fastvideo/configs/pipelines/base.py
@classmethod
def from_pretrained(cls, model_path: str) -> "PipelineConfig":
    """
    use the pipeline class setting from model_path to match the pipeline config
    """
    from fastvideo.configs.pipelines.registry import (
        get_pipeline_config_cls_from_name)
    pipeline_config_cls = get_pipeline_config_cls_from_name(model_path)

    return cast(PipelineConfig, pipeline_config_cls(model_path=model_path))
fastvideo.configs.pipelines.base.STA_Mode

Bases: str, Enum

STA (Sliding Tile Attention) modes.

fastvideo.configs.pipelines.base.SlidingTileAttnConfig dataclass
SlidingTileAttnConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = DiTConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = VAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (EncoderConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, window_size: int = 16, stride: int = 8, height: int = 576, width: int = 1024, pad_to_square: bool = False, use_overlap_optimization: bool = True)

Bases: PipelineConfig

Configuration for sliding tile attention.

Functions

fastvideo.configs.pipelines.base.parse_int_list
parse_int_list(value: str) -> list[int]

Parse a comma-separated string of integers into a list.

Source code in fastvideo/configs/pipelines/base.py
def parse_int_list(value: str) -> list[int]:
    """Parse a comma-separated string of integers into a list."""
    if not value:
        return []
    return [int(x.strip()) for x in value.split(",")]

fastvideo.configs.pipelines.cosmos

Classes

fastvideo.configs.pipelines.cosmos.CosmosConfig dataclass
CosmosConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: int = 6, flow_shift: float = 1.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = CosmosVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = CosmosVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5LargeConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_large_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, conditioning_strategy: str = 'frame_replace', min_num_conditional_frames: int = 1, max_num_conditional_frames: int = 2, sigma_conditional: float = 0.0001, sigma_data: float = 1.0, state_ch: int = 16, state_t: int = 24, text_encoder_class: str = 'T5')

Bases: PipelineConfig

Configuration for Cosmos2 Video2World pipeline matching diffusers.

Functions

fastvideo.configs.pipelines.cosmos.t5_large_postprocess_text
t5_large_postprocess_text(outputs: BaseEncoderOutput) -> Tensor

Postprocess T5 Large text encoder outputs for Cosmos pipeline.

Return raw last_hidden_state without truncation/padding.

Source code in fastvideo/configs/pipelines/cosmos.py
def t5_large_postprocess_text(outputs: BaseEncoderOutput) -> torch.Tensor:
    """Postprocess T5 Large text encoder outputs for Cosmos pipeline.

    Return raw last_hidden_state without truncation/padding.
    """
    hidden_state = outputs.last_hidden_state

    if hidden_state is None:
        raise ValueError("T5 Large outputs missing last_hidden_state")

    nan_count = torch.isnan(hidden_state).sum()
    if nan_count > 0:
        hidden_state = hidden_state.masked_fill(torch.isnan(hidden_state), 0.0)

    # Zero out embeddings beyond actual sequence length
    if outputs.attention_mask is not None:
        attention_mask = outputs.attention_mask
        lengths = attention_mask.sum(dim=1).cpu()
        for i, length in enumerate(lengths):
            hidden_state[i, length:] = 0

    return hidden_state

fastvideo.configs.pipelines.hunyuan

Classes

fastvideo.configs.pipelines.hunyuan.FastHunyuanConfig dataclass
FastHunyuanConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: int = 6, flow_shift: int = 17, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = HunyuanVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (LlamaConfig(), CLIPTextConfig()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp16', 'fp16'))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (llama_preprocess_text, clip_preprocess_text))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (llama_postprocess_text, clip_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None)

Bases: HunyuanConfig

Configuration specifically optimized for FastHunyuan weights.

fastvideo.configs.pipelines.hunyuan.HunyuanConfig dataclass
HunyuanConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: int = 6, flow_shift: int = 7, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = HunyuanVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (LlamaConfig(), CLIPTextConfig()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp16', 'fp16'))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (llama_preprocess_text, clip_preprocess_text))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (llama_postprocess_text, clip_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None)

Bases: PipelineConfig

Base configuration for HunYuan pipeline architecture.

fastvideo.configs.pipelines.hunyuan15

Classes

fastvideo.configs.pipelines.hunyuan15.Hunyuan15T2V480PConfig dataclass
Hunyuan15T2V480PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: int = 5, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideo15Config(), dit_precision: str = 'bf16', vae_config: VAEConfig = Hunyuan15VAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (Qwen2_5_VLConfig(), T5Config()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16', 'fp32'))(), preprocess_text_funcs: tuple[Callable[[Any], Any], ...] = (lambda: (qwen_preprocess_text, byt5_preprocess_text))(), postprocess_text_funcs: tuple[Callable[..., Any], ...] = (lambda: (qwen_postprocess_text, byt5_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, text_encoder_crop_start: int = PROMPT_TEMPLATE_TOKEN_LENGTH, text_encoder_max_lengths: tuple[int, ...] = (lambda: (1000 + PROMPT_TEMPLATE_TOKEN_LENGTH, 256))())

Bases: PipelineConfig

Base configuration for HunYuan pipeline architecture.

fastvideo.configs.pipelines.hunyuan15.Hunyuan15T2V720PConfig dataclass
Hunyuan15T2V720PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: int = 9, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = HunyuanVideo15Config(), dit_precision: str = 'bf16', vae_config: VAEConfig = Hunyuan15VAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = True, vae_sp: bool = True, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (Qwen2_5_VLConfig(), T5Config()))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16', 'fp32'))(), preprocess_text_funcs: tuple[Callable[[Any], Any], ...] = (lambda: (qwen_preprocess_text, byt5_preprocess_text))(), postprocess_text_funcs: tuple[Callable[..., Any], ...] = (lambda: (qwen_postprocess_text, byt5_postprocess_text))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, text_encoder_crop_start: int = PROMPT_TEMPLATE_TOKEN_LENGTH, text_encoder_max_lengths: tuple[int, ...] = (lambda: (1000 + PROMPT_TEMPLATE_TOKEN_LENGTH, 256))())

Bases: Hunyuan15T2V480PConfig

Base configuration for HunYuan pipeline architecture.

Functions

fastvideo.configs.pipelines.hunyuan15.extract_glyph_texts
extract_glyph_texts(prompt: str) -> str | None

Extract glyph texts from prompt using regex pattern.

Parameters:

Name Type Description Default
prompt str

Input prompt string

required

Returns:

Type Description
str | None

List of extracted glyph texts

Source code in fastvideo/configs/pipelines/hunyuan15.py
def extract_glyph_texts(prompt: str) -> str | None:
    """
    Extract glyph texts from prompt using regex pattern.

    Args:
        prompt: Input prompt string

    Returns:
        List of extracted glyph texts
    """
    pattern = r"\"(.*?)\"|“(.*?)”"
    matches = re.findall(pattern, prompt)
    result = [match[0] or match[1] for match in matches]
    result = list(dict.fromkeys(result)) if len(result) > 1 else result

    if result:
        formatted_result = ". ".join([f'Text "{text}"'
                                      for text in result]) + ". "
    else:
        formatted_result = None

    return formatted_result
fastvideo.configs.pipelines.hunyuan15.format_text_input
format_text_input(prompt: str, system_message: str) -> list[dict[str, Any]]

Apply text to template.

Parameters:

Name Type Description Default
prompt List[str]

Input text.

required
system_message str

System message.

required

Returns:

Type Description
list[dict[str, Any]]

List[Dict[str, Any]]: List of chat conversation.

Source code in fastvideo/configs/pipelines/hunyuan15.py
def format_text_input(prompt: str, system_message: str) -> list[dict[str, Any]]:
    """
    Apply text to template.

    Args:
        prompt (List[str]): Input text.
        system_message (str): System message.

    Returns:
        List[Dict[str, Any]]: List of chat conversation.
    """

    template = [{
        "role": "system",
        "content": system_message
    }, {
        "role": "user",
        "content": prompt if prompt else " "
    }]

    return template

fastvideo.configs.pipelines.longcat

Classes

fastvideo.configs.pipelines.longcat.LongCatDiTArchConfig dataclass
LongCatDiTArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), _fsdp_shard_conditions: list = list(), _compile_conditions: list = list(), param_names_mapping: dict = dict(), reverse_param_names_mapping: dict = dict(), lora_param_names_mapping: dict = dict(), _supported_attention_backends: tuple[AttentionBackendEnum, ...] = (SLIDING_TILE_ATTN, SAGE_ATTN, FLASH_ATTN, TORCH_SDPA, VIDEO_SPARSE_ATTN, VMOBA_ATTN, SAGE_ATTN_THREE), hidden_size: int = 0, num_attention_heads: int = 0, num_channels_latents: int = 0, in_channels: int = 16, out_channels: int = 16, exclude_lora_layers: list[str] = list(), boundary_ratio: float | None = None, adaln_tembed_dim: int = 512, caption_channels: int = 4096, depth: int = 48, enable_bsa: bool = False, enable_flashattn3: bool = False, enable_flashattn2: bool = True, enable_xformers: bool = False, frequency_embedding_size: int = 256, mlp_ratio: int = 4, num_heads: int = 32, text_tokens_zero_pad: bool = True, patch_size: list[int] = (lambda: [1, 2, 2])(), cp_split_hw: list[int] | None = None, bsa_params: dict | None = None)

Bases: DiTArchConfig

Extended DiTArchConfig with LongCat-specific fields.

NOTE: This is for Phase 1 wrapper compatibility. For native model (Phase 2), use LongCatVideoConfig from fastvideo.configs.models.dits.longcat instead.

fastvideo.configs.pipelines.longcat.LongCatT2V480PConfig dataclass
LongCatT2V480PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = (lambda: DiTConfig(arch_config=LongCatDiTArchConfig()))(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'bf16', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[T5Config, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (longcat_preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (umt5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, enable_kv_cache: bool = True, offload_kv_cache: bool = False, enable_bsa: bool = False, use_distill: bool = False, enhance_hf: bool = False, bsa_params: dict | None = None, bsa_sparsity: float | None = None, bsa_cdf_threshold: float | None = None, bsa_chunk_q: list[int] | None = None, bsa_chunk_k: list[int] | None = None, t_thresh: float | None = None)

Bases: PipelineConfig

Configuration for LongCat pipeline (480p) aligned to LongCat-Video modules.

Components expected by loaders
  • tokenizer: AutoTokenizer
  • text_encoder: UMT5EncoderModel
  • transformer: LongCatVideoTransformer3DModel (Phase 1 wrapper) OR LongCatTransformer3DModel (Phase 2 native)
  • vae: AutoencoderKLWan (Wan VAE, 4x8 compression)
  • scheduler: FlowMatchEulerDiscreteScheduler
fastvideo.configs.pipelines.longcat.LongCatT2V704PConfig dataclass
LongCatT2V704PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = (lambda: DiTConfig(arch_config=LongCatDiTArchConfig()))(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'bf16', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[T5Config, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (longcat_preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (umt5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, enable_kv_cache: bool = True, offload_kv_cache: bool = False, enable_bsa: bool = True, use_distill: bool = False, enhance_hf: bool = False, bsa_params: dict | None = None, bsa_sparsity: float | None = None, bsa_cdf_threshold: float | None = None, bsa_chunk_q: list[int] | None = None, bsa_chunk_k: list[int] | None = None, t_thresh: float | None = None)

Bases: LongCatT2V480PConfig

Configuration for LongCat pipeline (704p) with BSA enabled by default.

Uses the same resolution and BSA parameters as original LongCat refinement stage. BSA parameters configured in transformer config.json with chunk_3d_shape=[4,4,4]: - Input: 704×1280×96 - VAE (8x): 88×160×96
- Patch [1,2,2]: 44×80×96 - chunk [4,4,4]: 96%4=0, 44%4=0, 80%4=0 ✅

This configuration matches the original LongCat refinement stage parameters.

Functions

fastvideo.configs.pipelines.longcat.longcat_preprocess_text
longcat_preprocess_text(prompt: str) -> str

Clean and preprocess text like original LongCat implementation.

This function applies the same text cleaning pipeline as the original LongCat-Video implementation to ensure identical tokenization results.

Steps: 1. basic_clean: Fix unicode issues and unescape HTML entities 2. whitespace_clean: Normalize whitespace to single spaces

Parameters:

Name Type Description Default
prompt str

Raw input text prompt

required

Returns:

Type Description
str

Cleaned and normalized text prompt

Source code in fastvideo/configs/pipelines/longcat.py
def longcat_preprocess_text(prompt: str) -> str:
    """Clean and preprocess text like original LongCat implementation.

    This function applies the same text cleaning pipeline as the original
    LongCat-Video implementation to ensure identical tokenization results.

    Steps:
    1. basic_clean: Fix unicode issues and unescape HTML entities
    2. whitespace_clean: Normalize whitespace to single spaces

    Args:
        prompt: Raw input text prompt

    Returns:
        Cleaned and normalized text prompt
    """
    # basic_clean: fix unicode and HTML entities
    text = ftfy.fix_text(prompt)
    text = html.unescape(html.unescape(text))
    text = text.strip()

    # whitespace_clean: normalize whitespace
    text = re.sub(r"\s+", " ", text)
    text = text.strip()

    return text
fastvideo.configs.pipelines.longcat.umt5_postprocess_text
umt5_postprocess_text(outputs: BaseEncoderOutput) -> Tensor

Postprocess UMT5/T5 encoder outputs to fixed length 512 embeddings.

Source code in fastvideo/configs/pipelines/longcat.py
def umt5_postprocess_text(outputs: BaseEncoderOutput) -> torch.Tensor:
    """
    Postprocess UMT5/T5 encoder outputs to fixed length 512 embeddings.
    """
    mask: torch.Tensor = outputs.attention_mask
    hidden_state: torch.Tensor = outputs.last_hidden_state
    seq_lens = mask.gt(0).sum(dim=1).long()
    assert torch.isnan(hidden_state).sum() == 0
    prompt_embeds = [u[:v] for u, v in zip(hidden_state, seq_lens, strict=True)]
    prompt_embeds_tensor: torch.Tensor = torch.stack([
        torch.cat([u, u.new_zeros(512 - u.size(0), u.size(1))])
        for u in prompt_embeds
    ],
                                                     dim=0)
    return prompt_embeds_tensor

fastvideo.configs.pipelines.registry

Registry for pipeline weight-specific configurations.

Classes

Functions

fastvideo.configs.pipelines.registry.get_pipeline_config_cls_from_name
get_pipeline_config_cls_from_name(pipeline_name_or_path: str) -> type[PipelineConfig]

Get the appropriate configuration class for a given pipeline name or path.

This function implements a multi-step lookup process to find the most suitable configuration class for a given pipeline. It follows this order: 1. Exact match in the PIPE_NAME_TO_CONFIG 2. Partial match in the PIPE_NAME_TO_CONFIG 3. Fallback to class name in the model_index.json 4. else raise an error

Parameters:

Name Type Description Default
pipeline_name_or_path str

The name or path of the pipeline. This can be: - A registered model ID (e.g., "FastVideo/FastHunyuan-diffusers") - A local path to a model directory - A model ID that will be downloaded

required

Returns:

Type Description
type[PipelineConfig]

Type[PipelineConfig]: The configuration class that best matches the pipeline. This will be one of: - A specific weight configuration class if an exact match is found - A fallback configuration class based on the pipeline architecture - The base PipelineConfig class if no matches are found

Note
  • For local paths, the function will verify the model configuration
  • For remote models, it will attempt to download the model index
  • Warning messages are logged when falling back to less specific configurations
Source code in fastvideo/configs/pipelines/registry.py
def get_pipeline_config_cls_from_name(
        pipeline_name_or_path: str) -> type[PipelineConfig]:
    """Get the appropriate configuration class for a given pipeline name or path.

    This function implements a multi-step lookup process to find the most suitable
    configuration class for a given pipeline. It follows this order:
    1. Exact match in the PIPE_NAME_TO_CONFIG
    2. Partial match in the PIPE_NAME_TO_CONFIG
    3. Fallback to class name in the model_index.json
    4. else raise an error

    Args:
        pipeline_name_or_path (str): The name or path of the pipeline. This can be:
            - A registered model ID (e.g., "FastVideo/FastHunyuan-diffusers")
            - A local path to a model directory
            - A model ID that will be downloaded

    Returns:
        Type[PipelineConfig]: The configuration class that best matches the pipeline.
            This will be one of:
            - A specific weight configuration class if an exact match is found
            - A fallback configuration class based on the pipeline architecture
            - The base PipelineConfig class if no matches are found

    Note:
        - For local paths, the function will verify the model configuration
        - For remote models, it will attempt to download the model index
        - Warning messages are logged when falling back to less specific configurations
    """

    pipeline_config_cls: type[PipelineConfig] | None = None

    # First try exact match for specific weights
    if pipeline_name_or_path in PIPE_NAME_TO_CONFIG:
        pipeline_config_cls = PIPE_NAME_TO_CONFIG[pipeline_name_or_path]
        return pipeline_config_cls

    # Try partial matches (for local paths that might include the weight ID)
    for registered_id, config_class in PIPE_NAME_TO_CONFIG.items():
        if registered_id in pipeline_name_or_path:
            pipeline_config_cls = config_class
            break

    # If no match, try to use the fallback config
    if pipeline_config_cls is None:
        if os.path.exists(pipeline_name_or_path):
            config = verify_model_config_and_directory(pipeline_name_or_path)
        else:
            config = maybe_download_model_index(pipeline_name_or_path)
        logger.warning(
            "Trying to use the config from the model_index.json. FastVideo may not correctly identify the optimal config for this model in this situation."
        )

        pipeline_name = config["_class_name"]
        # Try to determine pipeline architecture for fallback
        for pipeline_type, detector in PIPELINE_DETECTOR.items():
            if detector(pipeline_name.lower()):
                pipeline_config_cls = PIPELINE_FALLBACK_CONFIG.get(
                    pipeline_type)
                break

        if pipeline_config_cls is not None:
            logger.warning(
                "No match found for pipeline %s, using fallback config %s.",
                pipeline_name_or_path, pipeline_config_cls)

    if pipeline_config_cls is None:
        raise ValueError(
            f"No match found for pipeline {pipeline_name_or_path}, please check the pipeline name or path."
        )

    return pipeline_config_cls

fastvideo.configs.pipelines.stepvideo

Classes

fastvideo.configs.pipelines.stepvideo.StepVideoT2VConfig dataclass
StepVideoT2VConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: int = 13, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = StepVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = StepVideoVAEConfig(), vae_precision: str = 'bf16', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (EncoderConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], tensor], ...] = (lambda: (postprocess_text,))(), pos_magic: str = '超高清、HDR 视频、环境光、杜比全景声、画面稳定、流畅动作、逼真的细节、专业级构图、超现实主义、自然、生动、超细节、清晰。', neg_magic: str = '画面暗、低分辨率、不良手、文本、缺少手指、多余的手指、裁剪、低质量、颗粒状、签名、水印、用户名、模糊。', timesteps_scale: bool = False, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16')

Bases: PipelineConfig

Base configuration for StepVideo pipeline architecture.

fastvideo.configs.pipelines.wan

Classes

fastvideo.configs.pipelines.wan.FastWan2_1_T2V_480P_Config dataclass
FastWan2_1_T2V_480P_Config(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 8.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = (lambda: [1000, 757, 522])(), ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanT2V480PConfig

Base configuration for FastWan T2V 1.3B 480P pipeline architecture with DMD

fastvideo.configs.pipelines.wan.WANV2VConfig dataclass
WANV2VConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 3.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = WAN2_1ControlCLIPVisionConfig(), image_encoder_precision: str = 'bf16', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanI2V480PConfig

Configuration for WAN2.1 1.3B Control pipeline.

fastvideo.configs.pipelines.wan.WanI2V480PConfig dataclass
WanI2V480PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 3.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = CLIPVisionConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanT2V480PConfig

Base configuration for Wan I2V 14B 480P pipeline architecture.

fastvideo.configs.pipelines.wan.WanI2V720PConfig dataclass
WanI2V720PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 5.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = CLIPVisionConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanI2V480PConfig

Base configuration for Wan I2V 14B 720P pipeline architecture.

fastvideo.configs.pipelines.wan.WanT2V480PConfig dataclass
WanT2V480PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 3.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: PipelineConfig

Base configuration for Wan T2V 1.3B pipeline architecture.

fastvideo.configs.pipelines.wan.WanT2V720PConfig dataclass
WanT2V720PConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = 5.0, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = WanVideoConfig(), dit_precision: str = 'bf16', vae_config: VAEConfig = WanVAEConfig(), vae_precision: str = 'fp32', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (T5Config(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('fp32',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (t5_postprocess_text,))(), pos_magic: str | None = None, neg_magic: str | None = None, timesteps_scale: bool | None = None, mask_strategy_file_path: str | None = None, STA_mode: STA_Mode = STA_INFERENCE, skip_time_steps: int = 15, dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, precision: str = 'bf16', warp_denoising_step: bool = True)

Bases: WanT2V480PConfig

Base configuration for Wan T2V 14B 720P pipeline architecture.