Solution for RoPE Base Scaling Factor (λ)
The scaling factor for Rotary Positional Embeddings (RoPE) can be determined by solving the period matching equation. This process involves simplifying the exponent to isolate . The resulting formula, which produces an embedding model adapted for different sequence lengths, is: Here, represents the new sequence length, is the original length, and is the embedding dimensionality.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
Solution for RoPE Base Scaling Factor (λ)
An engineer is adapting a language model to handle longer text sequences. The goal is to find a scaling factor,
λ, for the positional encoding base,b. The method involves setting the period of the highest frequency component in the new, adapted model equal to the period of a model scaled by linear interpolation. The dimensionality of the embeddings isd, the original sequence length ism_l, and the new sequence length ism. This constraint is captured by the following equation: Which part of this equation represents the period of the highest frequency dimension for the new model being developed?An engineer is adapting a language model to process sequences twice as long as its original design (i.e.,
m = 2 * m_l). They use a method where the period of the highest frequency component in the new model is set equal to that of a linearly scaled model. This relationship is captured by the equation: Given that the embedding dimensionalitydis greater than 2 and the original basebis a positive constant, how must the scaling factorλchange to satisfy this constraint for the new, longer sequence length?Evaluating a Proposed Simplification
Learn After
An AI engineering team is adapting a language model to handle longer text inputs. The model was originally trained with a maximum sequence length of 4096 tokens and uses an embedding dimensionality of 128. To extend the model's context window to 16384 tokens, they must apply a scaling factor (λ) to the base of its rotary positional embeddings. Using the formula
λ = (m / m_l)^(d / (d - 2)), wheremis the new sequence length,m_lis the original length, anddis the dimensionality, what is the correct scaling factor to apply?Impact of Embedding Dimensionality on RoPE Scaling
Influence of Dimensionality on RoPE Scaling Factor