quantization
¶
Classes¶
Functions¶
fastvideo.layers.quantization.register_quantization_config
¶
register_quantization_config(quantization: str)
Register a customized vllm quantization config.
When a quantization method is not supported by vllm, you can register a customized quantization config to support it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
quantization
|
str
|
The quantization method name. |
required |
Examples:
>>> from fastvideo.layers.quantization import register_quantization_config
>>> from fastvideo.layers.quantization import get_quantization_config
>>> from fastvideo.layers.quantization.base_config import QuantizationConfig
>>>
>>> @register_quantization_config("my_quant")
... class MyQuantConfig(QuantizationConfig):
... pass
>>>
>>> get_quantization_config("my_quant")
<class 'MyQuantConfig'>
Source code in fastvideo/layers/quantization/__init__.py
Modules¶
fastvideo.layers.quantization.base_config
¶
Classes¶
fastvideo.layers.quantization.base_config.QuantizationConfig
¶
Bases: ABC
Base class for quantization configs.
Source code in fastvideo/layers/quantization/base_config.py
Functions¶
fastvideo.layers.quantization.base_config.QuantizationConfig.from_config
abstractmethod
classmethod
¶from_config(config: dict[str, Any]) -> QuantizationConfig
Create a config class from the model's quantization config.
fastvideo.layers.quantization.base_config.QuantizationConfig.get_config_filenames
abstractmethod
staticmethod
¶ fastvideo.layers.quantization.base_config.QuantizationConfig.get_from_keys
staticmethod
¶Get a value from the model's quantization config.
Source code in fastvideo/layers/quantization/base_config.py
fastvideo.layers.quantization.base_config.QuantizationConfig.get_from_keys_or
staticmethod
¶Get a optional value from the model's quantization config.
Source code in fastvideo/layers/quantization/base_config.py
fastvideo.layers.quantization.base_config.QuantizationConfig.get_min_capability
abstractmethod
classmethod
¶get_min_capability() -> int
Minimum GPU capability to support the quantization method.
E.g., 70 for Volta, 75 for Turing, 80 for Ampere. This requirement is due to the custom CUDA kernels used by the quantization method.
Source code in fastvideo/layers/quantization/base_config.py
fastvideo.layers.quantization.base_config.QuantizationConfig.get_name
abstractmethod
¶ fastvideo.layers.quantization.base_config.QuantizationConfig.get_quant_method
abstractmethod
¶get_quant_method(layer: Module, prefix: str) -> QuantizeMethodBase | None
Get the quantize method to use for the quantized layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer
|
Module
|
The layer for the quant method. |
required |
prefix
|
str
|
The full name of the layer in the state dict |
required |
Returns: The quantize method. None if the given layer doesn't support quant method.
Source code in fastvideo/layers/quantization/base_config.py
fastvideo.layers.quantization.base_config.QuantizationConfig.get_supported_act_dtypes
abstractmethod
¶get_supported_act_dtypes() -> list[dtype]
fastvideo.layers.quantization.base_config.QuantizationConfig.override_quantization_method
classmethod
¶Detects if this quantization method can support a given checkpoint format by overriding the user specified quantization method -- this method should only be overwritten by subclasses in exceptional circumstances
Source code in fastvideo/layers/quantization/base_config.py
fastvideo.layers.quantization.base_config.QuantizeMethodBase
¶
Bases: ABC
Base class for different quantized methods.
Functions¶
fastvideo.layers.quantization.base_config.QuantizeMethodBase.apply
abstractmethod
¶Apply the weights in layer to the input tensor.
Expects create_weights to have been called before on the layer.
Source code in fastvideo/layers/quantization/base_config.py
fastvideo.layers.quantization.base_config.QuantizeMethodBase.create_weights
abstractmethod
¶Create weights for a layer.
The weights will be set as attributes of the layer.
fastvideo.layers.quantization.base_config.QuantizeMethodBase.embedding
¶Gather embeddings in the layer based on indices in the input tensor.
Expects create_weights to have been called before on the layer.
Source code in fastvideo/layers/quantization/base_config.py
fastvideo.layers.quantization.base_config.QuantizeMethodBase.process_weights_after_loading
¶Process the weight after loading.
This can be used for example, to transpose weights for computation.
Functions¶
fastvideo.layers.quantization.base_config.method_has_implemented_embedding
¶
method_has_implemented_embedding(method_class: type[QuantizeMethodBase]) -> bool
Not all quant methods have embedding implemented, so we need to check that it exists for our given method. We check this by making sure the function has been changed from the base implementation.