fastvideo.v1.layers.quantization.base_config#

Module Contents#

Classes#

QuantizationConfig

Base class for quantization configs.

QuantizeMethodBase

Base class for different quantized methods.

Functions#

method_has_implemented_embedding

Not all quant methods have embedding implemented, so we need to check that it exists for our given method. We check this by making sure the function has been changed from the base implementation.

API#

class fastvideo.v1.layers.quantization.base_config.QuantizationConfig[source]#

Bases: abc.ABC

Base class for quantization configs.

Initialization

abstract classmethod from_config(config: dict[str, Any]) fastvideo.v1.layers.quantization.base_config.QuantizationConfig[source]#

Create a config class from the model’s quantization config.

get_cache_scale(name: str) Optional[str][source]#
abstract static get_config_filenames() list[str][source]#

List of filenames to search for in the model directory.

static get_from_keys(config: dict[str, Any], keys: list[str]) Any[source]#

Get a value from the model’s quantization config.

static get_from_keys_or(config: dict[str, Any], keys: list[str], default: Any) Any[source]#

Get a optional value from the model’s quantization config.

abstract classmethod get_min_capability() int[source]#

Minimum GPU capability to support the quantization method.

E.g., 70 for Volta, 75 for Turing, 80 for Ampere. This requirement is due to the custom CUDA kernels used by the quantization method.

abstract get_name() fastvideo.v1.layers.quantization.QuantizationMethods[source]#

Name of the quantization method.

abstract get_quant_method(layer: torch.nn.Module, prefix: str) Optional[fastvideo.v1.layers.quantization.base_config.QuantizeMethodBase][source]#

Get the quantize method to use for the quantized layer.

Parameters:
  • layer – The layer for the quant method.

  • prefix – The full name of the layer in the state dict

Returns:

The quantize method. None if the given layer doesn’t support quant method.

abstract get_supported_act_dtypes() list[torch.dtype][source]#

List of supported activation dtypes.

classmethod override_quantization_method(hf_quant_cfg, user_quant) Optional[fastvideo.v1.layers.quantization.QuantizationMethods][source]#

Detects if this quantization method can support a given checkpoint format by overriding the user specified quantization method – this method should only be overwritten by subclasses in exceptional circumstances

class fastvideo.v1.layers.quantization.base_config.QuantizeMethodBase[source]#

Bases: abc.ABC

Base class for different quantized methods.

abstract apply(layer: torch.nn.Module, *args, **kwargs) torch.Tensor[source]#

Apply the weights in layer to the input tensor.

Expects create_weights to have been called before on the layer.

abstract create_weights(layer: torch.nn.Module, *weight_args, **extra_weight_attrs)[source]#

Create weights for a layer.

The weights will be set as attributes of the layer.

abstract embedding(layer: torch.nn.Module, *args, **kwargs) torch.Tensor[source]#

Gather embeddings in the layer based on indices in the input tensor.

Expects create_weights to have been called before on the layer.

process_weights_after_loading(layer: torch.nn.Module) None[source]#

Process the weight after loading.

This can be used for example, to transpose weights for computation.

fastvideo.v1.layers.quantization.base_config.method_has_implemented_embedding(method_class: type[fastvideo.v1.layers.quantization.base_config.QuantizeMethodBase]) bool[source]#

Not all quant methods have embedding implemented, so we need to check that it exists for our given method. We check this by making sure the function has been changed from the base implementation.