parameter ¶

Classes¶

fastvideo.models.parameter.BasevLLMParameter ¶

BasevLLMParameter(data: Tensor, weight_loader: Callable)

Bases: Parameter

Base parameter for vLLM linear layers. Extends the torch.nn.parameter by taking in a linear weight loader. Will copy the loaded weight into the parameter when the provided weight loader is called.

Initialize the BasevLLMParameter

:param data: torch tensor with the parameter data :param weight_loader: weight loader callable

:returns: a torch.nn.parameter

Source code in fastvideo/models/parameter.py

def __init__(self, data: torch.Tensor, weight_loader: Callable):
    """
    Initialize the BasevLLMParameter

    :param data: torch tensor with the parameter data
    :param weight_loader: weight loader callable

    :returns: a torch.nn.parameter
    """

    # During weight loading, we often do something like:
    # narrowed_tensor = param.data.narrow(0, offset, len)
    # narrowed_tensor.copy_(real_weight)
    # expecting narrowed_tensor and param.data to share the same storage.
    # However, on TPUs, narrowed_tensor will lazily propagate to the base
    # tensor, which is param.data, leading to the redundant memory usage.
    # This sometimes causes OOM errors during model loading. To avoid this,
    # we sync the param tensor after its weight loader is called.
    from fastvideo.platforms import current_platform
    if current_platform.is_tpu():
        weight_loader = _make_synced_weight_loader(weight_loader)

    self._weight_loader = weight_loader

Functions¶

fastvideo.models.parameter.BlockQuantScaleParameter ¶

BlockQuantScaleParameter(output_dim: int, **kwargs)

Bases: _ColumnvLLMParameter, RowvLLMParameter

Parameter class for weight scales loaded for weights with block-wise quantization. Uses both column and row parallelism.

Source code in fastvideo/models/parameter.py

def __init__(self, output_dim: int, **kwargs):
    self._output_dim = output_dim
    super().__init__(**kwargs)

fastvideo.models.parameter.ChannelQuantScaleParameter ¶

ChannelQuantScaleParameter(output_dim: int, **kwargs)

Bases: _ColumnvLLMParameter

Parameter class for weight scales loaded for weights with channel-wise quantization. Equivalent to _ColumnvLLMParameter.

Source code in fastvideo/models/parameter.py

def __init__(self, output_dim: int, **kwargs):
    self._output_dim = output_dim
    super().__init__(**kwargs)

fastvideo.models.parameter.GroupQuantScaleParameter ¶

GroupQuantScaleParameter(output_dim: int, **kwargs)

Bases: _ColumnvLLMParameter, RowvLLMParameter

Parameter class for weight scales loaded for weights with grouped quantization. Uses both column and row parallelism.

Source code in fastvideo/models/parameter.py

def __init__(self, output_dim: int, **kwargs):
    self._output_dim = output_dim
    super().__init__(**kwargs)

fastvideo.models.parameter.ModelWeightParameter ¶

ModelWeightParameter(output_dim: int, **kwargs)

Bases: _ColumnvLLMParameter, RowvLLMParameter

Parameter class for linear layer weights. Uses both column and row parallelism.

Source code in fastvideo/models/parameter.py

def __init__(self, output_dim: int, **kwargs):
    self._output_dim = output_dim
    super().__init__(**kwargs)

fastvideo.models.parameter.PackedColumnParameter ¶

PackedColumnParameter(packed_factor: int | Fraction, packed_dim: int, **kwargs)

Bases: _ColumnvLLMParameter

Parameter for model parameters which are packed on disk and support column parallelism only. See PackedvLLMParameter for more details on the packed properties.

Source code in fastvideo/models/parameter.py

def __init__(self, packed_factor: int | Fraction, packed_dim: int,
             **kwargs):
    self._packed_factor = packed_factor
    self._packed_dim = packed_dim
    super().__init__(**kwargs)

fastvideo.models.parameter.PackedvLLMParameter ¶

PackedvLLMParameter(packed_factor: int | Fraction, packed_dim: int, **kwargs)

Bases: ModelWeightParameter

Parameter for model weights which are packed on disk. Example: GPTQ Marlin weights are int4 or int8, packed into int32. Extends the ModelWeightParameter to take in the packed factor, the packed dimension, and optionally, marlin tile size for marlin kernels. Adjusts the shard_size and shard_offset for fused linear layers model weight loading by accounting for packing and optionally, marlin tile size.

Source code in fastvideo/models/parameter.py

def __init__(self, packed_factor: int | Fraction, packed_dim: int,
             **kwargs):
    self._packed_factor = packed_factor
    self._packed_dim = packed_dim
    super().__init__(**kwargs)

fastvideo.models.parameter.PerTensorScaleParameter ¶

PerTensorScaleParameter(**kwargs)

Bases: BasevLLMParameter

Parameter class for scales where the number of scales is equivalent to the number of logical matrices in fused linear layers (e.g. for QKV, there are 3 scales loaded from disk). This is relevant to weights with per-tensor quantization. Adds functionality to map the scalers to a shard during weight loading.

Note: additional parameter manipulation may be handled for each quantization config specifically, within process_weights_after_loading

Source code in fastvideo/models/parameter.py

def __init__(self, **kwargs):
    self.qkv_idxs = {"q": 0, "k": 1, "v": 2}
    super().__init__(**kwargs)

fastvideo.models.parameter.RowvLLMParameter ¶

RowvLLMParameter(input_dim: int, **kwargs)

Bases: BasevLLMParameter

Parameter class defining weight_loading functionality (load_row_parallel_weight) for parameters being loaded into linear layers with row parallel functionality. Requires an input_dim to be defined.

Source code in fastvideo/models/parameter.py

def __init__(self, input_dim: int, **kwargs):
    self._input_dim = input_dim
    super().__init__(**kwargs)

Functions¶

fastvideo.models.parameter.permute_param_layout_ ¶

permute_param_layout_(param: BasevLLMParameter, input_dim: int, output_dim: int, **kwargs) -> BasevLLMParameter

Permute a parameter's layout to the specified input and output dimensions, useful for forcing the parameter into a known layout, for example, if I need a packed (quantized) weight matrix to be in the layout {input_dim = 0, output_dim = 1, packed_dim = 0} then I can call: permute_param_layout_(x, input_dim=0, output_dim=1, packed_dim=0) to ensure x is in the correct layout (permuting it to the correct layout if required, asserting if it cannot get it to the correct layout)

Source code in fastvideo/models/parameter.py

def permute_param_layout_(param: BasevLLMParameter, input_dim: int,
                          output_dim: int, **kwargs) -> BasevLLMParameter:
    """
    Permute a parameter's layout to the specified input and output dimensions, 
    useful for forcing the parameter into a known layout, for example, if I need
    a packed (quantized) weight matrix to be in the layout 
        {input_dim = 0, output_dim = 1, packed_dim = 0}
    then I can call:
        permute_param_layout_(x, input_dim=0, output_dim=1, packed_dim=0)
    to ensure x is in the correct layout (permuting it to the correct layout if 
    required, asserting if it cannot get it to the correct layout)
    """

    curr_input_dim = getattr(param, "input_dim", None)
    curr_output_dim = getattr(param, "output_dim", None)

    if curr_input_dim is None or curr_output_dim is None:
        assert param.data.dim() == 2,\
            "permute_param_layout_ only supports 2D parameters when either "\
            "input_dim or output_dim is not set"

    # if one of the dimensions is not set, set it to the opposite of the other
    #  we can only do this since we asserted the parameter is 2D above
    if curr_input_dim is None:
        assert curr_output_dim is not None,\
            "either input or output dim must be set"
        curr_input_dim = (curr_output_dim + 1) % 2
    if curr_output_dim is None:
        assert curr_input_dim is not None,\
            "either input or output dim must be set"
        curr_output_dim = (curr_input_dim + 1) % 2

    # create permutation from the current layout to the layout with
    # self.input_dim at input_dim and self.output_dim at output_dim preserving
    # other dimensions
    perm = [
        i for i in range(param.data.dim())
        if i not in [curr_input_dim, curr_output_dim]
    ]
    perm.insert(input_dim, curr_input_dim)
    perm.insert(output_dim, curr_output_dim)

    if "packed_dim" in kwargs:
        assert hasattr(param, "packed_dim") and\
            param.packed_dim == perm[kwargs["packed_dim"]],\
            "permute_param_layout_ currently doesn't support repacking"

    param.data = param.data.permute(*perm)
    if hasattr(param, "_input_dim"):
        param._input_dim = input_dim
    if hasattr(param, "_output_dim"):
        param._output_dim = output_dim
    if "packed_dim" in kwargs and hasattr(param, "_packed_dim"):
        param._packed_dim = kwargs["packed_dim"]

    return param