Learn Before
Formula for RoPE with Linear Positional Interpolation
The implementation of linear positional interpolation in Rotary Positional Embeddings (RoPE) modifies the input to the embedding model. The new model, denoted as , scales the position parameters by a factor of , and is given by: where is the original sequence length and is the new extended sequence length.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
Formula for RoPE with Linear Positional Interpolation
A researcher defines a new rotary position embedding function,
Ro_new, for a tokenx_iat positioni. The new function is defined asRo_new(x_i, iθ) = Ro(x_i, (i+c)θ), whereRois the original function andcis a constant offset. According to the general equivalence principle, this can be written asRo_new(x_i, iθ) = Ro(x_i, iθ'). What is the correct expression for the transformed position parameteriθ'?RoPE Scaling Transformation Equivalence
Equivalence of RoPE Modification Strategies
Analysis of a Flawed RoPE Modification
Learn After
A language model, originally trained with a maximum sequence length of 2048, is being adapted to handle a new maximum length of 8192 using linear positional interpolation. This technique modifies the rotary position embedding function (
Ro) by scaling the position index. The new function,Ro', for a tokenx_iat positioniin the longer sequence, is equivalent to applying the original functionRowith a scaled positional argument:Ro'(x_i, iθ) = Ro(x_i, (original_max_length / new_max_length) * iθ). For a token at positioni = 4096in the new, extended context, what is the scaled positional argument that would be passed to the originalRofunction?Adapting a Language Model for Longer Sequences
A language model's context window is being extended from an original maximum length
m_lto a new, larger maximum lengthm. The technique used modifies the rotary position embedding function (Ro) by scaling the position indexiaccording to the formula:New Effective Position = (m_l / m) * i. This formula implies that the effective position for a token at the very end of the new, extended context (positionm-1) is mapped to a position that falls outside the range of the original model's trained positions, which is [0,m_l-1].