Calculation of RoPE Frequency Parameters
In Rotary Positional Embeddings (RoPE), the individual frequency parameters are calculated using a fixed base, commonly 10000. This approach is analogous to the frequency settings used in sinusoidal positional embeddings. The formula for the k-th frequency component is given by: where is the dimensionality of the embedding and is the index of the component, ranging from $1d/2$.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
Definition of the 2x2 RoPE Rotation Matrix Block
Calculation of RoPE Frequency Parameters
Exponential Form of RoPE Frequency Parameters
In a transformer model using Rotary Positional Embeddings, the transformation for each token depends on its position and a vector of frequency parameters,
θ = [θ₁, ..., θ_{d/2}], where each componentθ_kcorresponds to a different 2-dimensional rotation. A researcher proposes a modification where all components of this vector are set to the same value (i.e.,θ₁ = θ₂ = ... = θ_{d/2}). What is the most likely consequence of this change on the model's ability to represent positional information?Effect of Modifying RoPE Frequency Parameters
In the implementation of Rotary Positional Embeddings (RoPE), the vector of frequency parameters,
θ = [θ₁, ..., θ_{d/2}], ensures that for any given token position, all pairs of dimensions within its embedding are rotated by the same amount.Calculation of RoPE Frequency Parameters
Formula for the Period of RoPE's Sine and Cosine Components
Consider the generalized formula for calculating a set of frequency parameters: In this formula,
bis a configurable base greater than 1,dis the dimensionality (a positive integer), andkis the component index, which is an integer greater than 1. How would increasing the value of the basebaffect the calculated frequencyθ_kfor any givenkandd?Determining the Base from a Frequency Parameter
Tuning Positional Embeddings for Long-Context Models
Learn After
An engineer is implementing a transformer model with an embedding dimensionality of
d = 512. For the positional information, they use a method where frequency parametersθ_kare calculated using the formula:θ_k = 10000^(-2(k-1)/d). What is the correct value for the frequency parameterθ_kwhere the component index isk = 129?Consider the formula for calculating frequency parameters in a positional embedding scheme:
θ_k = 10000^(-2(k-1)/d), wheredis the embedding dimension andkis the component index. According to this formula, as the component indexkincreases, the value of the frequency parameterθ_kalso increases.Impact of Base Value on Frequency Parameters