fastvideo.v1.layers.rotary_embedding
#
Rotary Positional Embeddings.
Module Contents#
Classes#
Original rotary positional embedding. |
Functions#
Precompute the frequency tensor for complex exponential (cis) with given dimensions.
(Note: |
|
Get n-D meshgrid with start, stop and num. |
|
This is a n-d version of precompute_freqs_cis, which is a RoPE for tokens with n-d structure. Supports sequence parallelism by allowing sharding of a specific dimension. |
|
Generate rotary positional embeddings for the given sizes. |
API#
- class fastvideo.v1.layers.rotary_embedding.RotaryEmbedding(head_size: int, rotary_dim: int, max_position_embeddings: int, base: Union[int, float], is_neox_style: bool, dtype: torch.dtype)[source]#
Bases:
fastvideo.v1.layers.custom_op.CustomOp
Original rotary positional embedding.
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward_native(positions: torch.Tensor, query: torch.Tensor, key: torch.Tensor, offsets: Optional[torch.Tensor] = None) Tuple[torch.Tensor, torch.Tensor] [source]#
A PyTorch-native implementation of forward().
- fastvideo.v1.layers.rotary_embedding.get_1d_rotary_pos_embed(dim: int, pos: Union[torch.FloatTensor, int], theta: float = 10000.0, theta_rescale_factor: float = 1.0, interpolation_factor: float = 1.0, dtype: torch.dtype = torch.float32) Tuple[torch.Tensor, torch.Tensor] [source]#
Precompute the frequency tensor for complex exponential (cis) with given dimensions. (Note:
cis
meanscos + i * sin
, where i is the imaginary unit.)This function calculates a frequency tensor with complex exponential using the given dimension βdimβ and the end index βendβ. The βthetaβ parameter scales the frequencies.
- Parameters:
dim (int) β Dimension of the frequency tensor.
pos (int or torch.FloatTensor) β Position indices for the frequency tensor. [S] or scalar
theta (float, optional) β Scaling factor for frequency computation. Defaults to 10000.0.
theta_rescale_factor (float, optional) β Rescale factor for theta. Defaults to 1.0.
interpolation_factor (float, optional) β Factor to scale positions. Defaults to 1.0.
- Returns:
Precomputed frequency tensor with real and imaginary parts separately. [S, D]
- Return type:
freqs_cos, freqs_sin
- fastvideo.v1.layers.rotary_embedding.get_meshgrid_nd(start: Union[int, Tuple[int, ...]], *args: Union[int, Tuple[int, ...]], dim: int = 2) torch.Tensor [source]#
Get n-D meshgrid with start, stop and num.
- Parameters:
start (int or tuple) β If len(args) == 0, start is num; If len(args) == 1, start is start, args[0] is stop, step is 1; If len(args) == 2, start is start, args[0] is stop, args[1] is num. For n-dim, start/stop/num should be int or n-tuple. If n-tuple is provided, the meshgrid will be stacked following the dim order in n-tuples.
*args β See above.
dim (int) β Dimension of the meshgrid. Defaults to 2.
- Returns:
[dim, β¦]
- Return type:
grid (np.ndarray)
- fastvideo.v1.layers.rotary_embedding.get_nd_rotary_pos_embed(rope_dim_list, start, *args, theta=10000.0, theta_rescale_factor: Union[float, List[float]] = 1.0, interpolation_factor: Union[float, List[float]] = 1.0, shard_dim: int = 0, sp_rank: int = 0, sp_world_size: int = 1, dtype: torch.dtype = torch.float32) Tuple[torch.Tensor, torch.Tensor] [source]#
This is a n-d version of precompute_freqs_cis, which is a RoPE for tokens with n-d structure. Supports sequence parallelism by allowing sharding of a specific dimension.
- Parameters:
rope_dim_list (list of int) β Dimension of each rope. len(rope_dim_list) should equal to n. sum(rope_dim_list) should equal to head_dim of attention layer.
start (int | tuple of int | list of int) β If len(args) == 0, start is num; If len(args) == 1, start is start, args[0] is stop, step is 1; If len(args) == 2, start is start, args[0] is stop, args[1] is num.
*args β See above.
theta (float) β Scaling factor for frequency computation. Defaults to 10000.0.
theta_rescale_factor (float) β Rescale factor for theta. Defaults to 1.0.
interpolation_factor (float) β Factor to scale positions. Defaults to 1.0.
shard_dim (int) β Which dimension to shard for sequence parallelism. Defaults to 0.
sp_rank (int) β Rank in the sequence parallel group. Defaults to 0.
sp_world_size (int) β World size of the sequence parallel group. Defaults to 1.
- Returns:
(cos, sin) tensors of shape [HW, D/2]
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- fastvideo.v1.layers.rotary_embedding.get_rope(head_size: int, rotary_dim: int, max_position: int, base: Union[int, float], is_neox_style: bool = True, rope_scaling: Optional[Dict[str, Any]] = None, dtype: Optional[torch.dtype] = None, partial_rotary_factor: float = 1.0) fastvideo.v1.layers.rotary_embedding.RotaryEmbedding [source]#
- fastvideo.v1.layers.rotary_embedding.get_rotary_pos_embed(rope_sizes, hidden_size, heads_num, rope_dim_list, rope_theta, theta_rescale_factor=1.0, interpolation_factor=1.0, shard_dim: int = 0, dtype: torch.dtype = torch.float32) Tuple[torch.Tensor, torch.Tensor] [source]#
Generate rotary positional embeddings for the given sizes.
- Parameters:
rope_sizes β Tuple of dimensions (t, h, w)
hidden_size β Hidden dimension size
heads_num β Number of attention heads
rope_dim_list β List of dimensions for each axis, or None
rope_theta β Base for frequency calculations
theta_rescale_factor β Rescale factor for theta. Defaults to 1.0
interpolation_factor β Factor to scale positions. Defaults to 1.0
shard_dim β Which dimension to shard for sequence parallelism. Defaults to 0.
- Returns:
Tuple of (cos, sin) tensors for rotary embeddings