rotary_embedding
¶
Rotary Positional Embeddings.
Classes¶
fastvideo.layers.rotary_embedding.RotaryEmbedding
¶
RotaryEmbedding(head_size: int, rotary_dim: int, max_position_embeddings: int, base: int | float, is_neox_style: bool, dtype: dtype)
Bases: CustomOp
Original rotary positional embedding.
Source code in fastvideo/layers/rotary_embedding.py
Functions¶
fastvideo.layers.rotary_embedding.RotaryEmbedding.forward_native
¶
forward_native(positions: Tensor, query: Tensor, key: Tensor, offsets: Tensor | None = None) -> tuple[Tensor, Tensor]
A PyTorch-native implementation of forward().
Source code in fastvideo/layers/rotary_embedding.py
Functions¶
fastvideo.layers.rotary_embedding.apply_rotary_emb
¶
apply_rotary_emb(x: Tensor, freqs_cis: Tensor | tuple[Tensor, Tensor], use_real: bool = True, use_real_unbind_dim: int = -1) -> Tensor
Apply rotary embeddings to input tensors using the given frequency tensor. This function applies rotary embeddings
to the given query or key 'x' tensors using the provided frequency tensor 'freqs_cis'. The input tensors are
reshaped as complex numbers, and the frequency tensor is reshaped for broadcasting compatibility. The resulting
tensors contain rotary embeddings and are returned as real tensors.
Args:
x (torch.Tensor):
Query or key tensor to apply rotary embeddings. [B, H, S, D] xk (torch.Tensor): Key tensor to apply
freqs_cis (Tuple[torch.Tensor]): Precomputed frequency tensor for complex exponentials. ([S, D], [S, D],)
Returns:
Tuple[torch.Tensor, torch.Tensor]: Tuple of modified query tensor and key tensor with rotary embeddings.
Source code in fastvideo/layers/rotary_embedding.py
fastvideo.layers.rotary_embedding.get_1d_rotary_pos_embed
¶
get_1d_rotary_pos_embed(dim: int, pos: FloatTensor | int, theta: float = 10000.0, theta_rescale_factor: float = 1.0, interpolation_factor: float = 1.0, dtype: dtype = float32) -> tuple[Tensor, Tensor]
Precompute the frequency tensor for complex exponential (cis) with given dimensions.
(Note: cis means cos + i * sin, where i is the imaginary unit.)
This function calculates a frequency tensor with complex exponential using the given dimension 'dim' and the end index 'end'. The 'theta' parameter scales the frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dim
|
int
|
Dimension of the frequency tensor. |
required |
pos
|
int or FloatTensor
|
Position indices for the frequency tensor. [S] or scalar |
required |
theta
|
float
|
Scaling factor for frequency computation. Defaults to 10000.0. |
10000.0
|
theta_rescale_factor
|
float
|
Rescale factor for theta. Defaults to 1.0. |
1.0
|
interpolation_factor
|
float
|
Factor to scale positions. Defaults to 1.0. |
1.0
|
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor]
|
freqs_cos, freqs_sin: Precomputed frequency tensor with real and imaginary parts separately. [S, D] |
Source code in fastvideo/layers/rotary_embedding.py
fastvideo.layers.rotary_embedding.get_meshgrid_nd
¶
Get n-D meshgrid with start, stop and num.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
int or tuple
|
If len(args) == 0, start is num; If len(args) == 1, start is start, args[0] is stop, step is 1; If len(args) == 2, start is start, args[0] is stop, args[1] is num. For n-dim, start/stop/num should be int or n-tuple. If n-tuple is provided, the meshgrid will be stacked following the dim order in n-tuples. |
required |
*args
|
int | tuple[int, ...]
|
See above. |
()
|
dim
|
int
|
Dimension of the meshgrid. Defaults to 2. |
2
|
Returns:
| Name | Type | Description |
|---|---|---|
grid |
ndarray
|
[dim, ...] |
Source code in fastvideo/layers/rotary_embedding.py
fastvideo.layers.rotary_embedding.get_nd_rotary_pos_embed
¶
get_nd_rotary_pos_embed(rope_dim_list, start, *args, theta=10000.0, theta_rescale_factor: float | list[float] = 1.0, interpolation_factor: float | list[float] = 1.0, shard_dim: int = 0, sp_rank: int = 0, sp_world_size: int = 1, dtype: dtype = float32, start_frame: int = 0) -> tuple[Tensor, Tensor]
This is a n-d version of precompute_freqs_cis, which is a RoPE for tokens with n-d structure. Supports sequence parallelism by allowing sharding of a specific dimension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rope_dim_list
|
list of int
|
Dimension of each rope. len(rope_dim_list) should equal to n. sum(rope_dim_list) should equal to head_dim of attention layer. |
required |
start
|
int | tuple of int | list of int
|
If len(args) == 0, start is num; If len(args) == 1, start is start, args[0] is stop, step is 1; If len(args) == 2, start is start, args[0] is stop, args[1] is num. |
required |
*args
|
See above. |
()
|
|
theta
|
float
|
Scaling factor for frequency computation. Defaults to 10000.0. |
10000.0
|
theta_rescale_factor
|
float
|
Rescale factor for theta. Defaults to 1.0. |
1.0
|
interpolation_factor
|
float
|
Factor to scale positions. Defaults to 1.0. |
1.0
|
shard_dim
|
int
|
Which dimension to shard for sequence parallelism. Defaults to 0. |
0
|
sp_rank
|
int
|
Rank in the sequence parallel group. Defaults to 0. |
0
|
sp_world_size
|
int
|
World size of the sequence parallel group. Defaults to 1. |
1
|
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor]
|
Tuple[torch.Tensor, torch.Tensor]: (cos, sin) tensors of shape [HW, D/2] |
Source code in fastvideo/layers/rotary_embedding.py
315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 | |
fastvideo.layers.rotary_embedding.get_rotary_pos_embed
¶
get_rotary_pos_embed(rope_sizes, hidden_size, heads_num, rope_dim_list, rope_theta, theta_rescale_factor=1.0, interpolation_factor=1.0, shard_dim: int = 0, dtype: dtype = float32, start_frame: int = 0) -> tuple[Tensor, Tensor]
Generate rotary positional embeddings for the given sizes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rope_sizes
|
Tuple of dimensions (t, h, w) |
required | |
hidden_size
|
Hidden dimension size |
required | |
heads_num
|
Number of attention heads |
required | |
rope_dim_list
|
List of dimensions for each axis, or None |
required | |
rope_theta
|
Base for frequency calculations |
required | |
theta_rescale_factor
|
Rescale factor for theta. Defaults to 1.0 |
1.0
|
|
interpolation_factor
|
Factor to scale positions. Defaults to 1.0 |
1.0
|
|
shard_dim
|
int
|
Which dimension to shard for sequence parallelism. Defaults to 0. |
0
|
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor]
|
Tuple of (cos, sin) tensors for rotary embeddings |