Learn Before
Formula for Scaled RoPE Frequency Parameters (θ')
When adapting Rotary Positional Embeddings (RoPE) for different sequence lengths, a new vector of frequency parameters, , is defined. The components of this vector are calculated using a scaling factor , a base term , and the embedding dimension . The formula for the vector is:

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
Origin of NTK-Aware Scaled RoPE
Formula for Scaled RoPE Frequency Parameters (θ')
An engineer extends the context window of a language model that uses rotary positional embeddings. After modification, they find the model struggles with tasks requiring an understanding of long-range dependencies, as if the relative positioning of distant tokens is lost. Which of the following statements best analyzes the fundamental reason for this failure?
Two engineers are modifying a language model's Rotary Positional Embeddings (RoPE) to handle longer text sequences.
- Engineer A proposes modifying the core RoPE transformation function itself (creating a new function, Ro') while keeping the original positional angles (θ) the same.
- Engineer B proposes keeping the original RoPE transformation function (Ro) unchanged but applying it to a new, scaled set of positional angles (θ').
To ensure that the relative positional information is preserved correctly during this context extension, a key condition must be met: the outcome of the new system must be equivalent to the outcome of the original system applied to scaled positions. Based on this principle, which engineer's approach is more theoretically sound, and why?
Interpreting the RoPE Scaling Condition
Learn After
A language model's positional embeddings are being adapted for a new context length. The adaptation uses a scaling factor
λ = 0.25, a base termb = 10000, and an embedding dimensiond = 128. Based on the formula for the scaled frequency parameters,θ' = [ (λb)^(-0/d), (λb)^(-2/d), ..., (λb)^(-(d-2)/d) ], what is the approximate value of the second component in theθ'vector?Evaluating RoPE Scaling for Context Extension
Impact of Scaling Factor on RoPE Frequencies