fastvideo.v1.layers.rotary_embedding

`fastvideo.v1.layers.rotary_embedding`#

Rotary Positional Embeddings.

Module Contents#

Classes#

RotaryEmbedding

Original rotary positional embedding.

Functions#

`get_1d_rotary_pos_embed`	Precompute the frequency tensor for complex exponential (cis) with given dimensions. (Note: `cis` means `cos + i * sin`, where i is the imaginary unit.)
`get_meshgrid_nd`	Get n-D meshgrid with start, stop and num.
`get_nd_rotary_pos_embed`	This is a n-d version of precompute_freqs_cis, which is a RoPE for tokens with n-d structure. Supports sequence parallelism by allowing sharding of a specific dimension.
`get_rope`
`get_rotary_pos_embed`	Generate rotary positional embeddings for the given sizes.

API#

class fastvideo.v1.layers.rotary_embedding.RotaryEmbedding(head_size: int, rotary_dim: int, max_position_embeddings: int, base: Union[int, float], is_neox_style: bool, dtype: torch.dtype)[source]#

Bases: fastvideo.v1.layers.custom_op.CustomOp

Original rotary positional embedding.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

extra_repr() → str[source]#

forward_native(positions: torch.Tensor, query: torch.Tensor, key: torch.Tensor, offsets: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]#: A PyTorch-native implementation of forward().

fastvideo.v1.layers.rotary_embedding.get_1d_rotary_pos_embed(dim: int, pos: Union[torch.FloatTensor, int], theta: float = 10000.0, theta_rescale_factor: float = 1.0, interpolation_factor: float = 1.0, dtype: torch.dtype = torch.float32) → Tuple[torch.Tensor, torch.Tensor][source]#

Precompute the frequency tensor for complex exponential (cis) with given dimensions. (Note: cis means cos + i * sin, where i is the imaginary unit.)

This function calculates a frequency tensor with complex exponential using the given dimension ‘dim’ and the end index ‘end’. The ‘theta’ parameter scales the frequencies.

Parameters:

dim (int) – Dimension of the frequency tensor.
pos (int or torch.FloatTensor) – Position indices for the frequency tensor. [S] or scalar
theta (float, optional) – Scaling factor for frequency computation. Defaults to 10000.0.
theta_rescale_factor (float, optional) – Rescale factor for theta. Defaults to 1.0.
interpolation_factor (float, optional) – Factor to scale positions. Defaults to 1.0.

Returns:

Precomputed frequency tensor with real and imaginary parts separately. [S, D]

Return type:

freqs_cos, freqs_sin

fastvideo.v1.layers.rotary_embedding.get_meshgrid_nd(start: Union[int, Tuple[int, ...]], *args: Union[int, Tuple[int, ...]], dim: int = 2) → torch.Tensor[source]#

Get n-D meshgrid with start, stop and num.

Parameters:

start (int or tuple) – If len(args) == 0, start is num; If len(args) == 1, start is start, args[0] is stop, step is 1; If len(args) == 2, start is start, args[0] is stop, args[1] is num. For n-dim, start/stop/num should be int or n-tuple. If n-tuple is provided, the meshgrid will be stacked following the dim order in n-tuples.
*args – See above.
dim (int) – Dimension of the meshgrid. Defaults to 2.

Returns:

[dim, …]

Return type:

grid (np.ndarray)

fastvideo.v1.layers.rotary_embedding.get_nd_rotary_pos_embed(rope_dim_list, start, *args, theta=10000.0, theta_rescale_factor: Union[float, List[float]] = 1.0, interpolation_factor: Union[float, List[float]] = 1.0, shard_dim: int = 0, sp_rank: int = 0, sp_world_size: int = 1, dtype: torch.dtype = torch.float32) → Tuple[torch.Tensor, torch.Tensor][source]#

This is a n-d version of precompute_freqs_cis, which is a RoPE for tokens with n-d structure. Supports sequence parallelism by allowing sharding of a specific dimension.

Parameters:

rope_dim_list (list of int) – Dimension of each rope. len(rope_dim_list) should equal to n. sum(rope_dim_list) should equal to head_dim of attention layer.
start (int | tuple of int | list of int) – If len(args) == 0, start is num; If len(args) == 1, start is start, args[0] is stop, step is 1; If len(args) == 2, start is start, args[0] is stop, args[1] is num.
*args – See above.
theta (float) – Scaling factor for frequency computation. Defaults to 10000.0.
theta_rescale_factor (float) – Rescale factor for theta. Defaults to 1.0.
interpolation_factor (float) – Factor to scale positions. Defaults to 1.0.
shard_dim (int) – Which dimension to shard for sequence parallelism. Defaults to 0.
sp_rank (int) – Rank in the sequence parallel group. Defaults to 0.
sp_world_size (int) – World size of the sequence parallel group. Defaults to 1.

Returns:

(cos, sin) tensors of shape [HW, D/2]

Return type:

Tuple[torch.Tensor, torch.Tensor]

fastvideo.v1.layers.rotary_embedding.get_rope(head_size: int, rotary_dim: int, max_position: int, base: Union[int, float], is_neox_style: bool = True, rope_scaling: Optional[Dict[str, Any]] = None, dtype: Optional[torch.dtype] = None, partial_rotary_factor: float = 1.0) → fastvideo.v1.layers.rotary_embedding.RotaryEmbedding[source]#

fastvideo.v1.layers.rotary_embedding.get_rotary_pos_embed(rope_sizes, hidden_size, heads_num, rope_dim_list, rope_theta, theta_rescale_factor=1.0, interpolation_factor=1.0, shard_dim: int = 0, dtype: torch.dtype = torch.float32) → Tuple[torch.Tensor, torch.Tensor][source]#

Generate rotary positional embeddings for the given sizes.

Parameters:

rope_sizes – Tuple of dimensions (t, h, w)
hidden_size – Hidden dimension size
heads_num – Number of attention heads
rope_dim_list – List of dimensions for each axis, or None
rope_theta – Base for frequency calculations
theta_rescale_factor – Rescale factor for theta. Defaults to 1.0
interpolation_factor – Factor to scale positions. Defaults to 1.0
shard_dim – Which dimension to shard for sequence parallelism. Defaults to 0.

Returns:

Tuple of (cos, sin) tensors for rotary embeddings

fastvideo.v1.layers.rotary_embedding

Contents

fastvideo.v1.layers.rotary_embedding#

Module Contents#

Classes#

Functions#

API#

`fastvideo.v1.layers.rotary_embedding`#