1Cademy - RoPE Scaling Transformation Equivalence

Learn Before

Architectural Adaptation of LLMs for Long Sequences
General Equivalence of Modified RoPE

Formula

RoPE Scaling Transformation Equivalence

The scaling of Rotary Positional Embeddings (RoPE) can be conceptualized as a transformation of the rotation angle. A scaled rotation function, $\mathrm{Ro}'$ , applied to an embedding $\mathbf{x}_i$ with an original angle $\theta$ , is equivalent to applying the original rotation function, $\mathrm{Ro}$ , with a transformed angle $\theta'$ . This equivalence is captured by the formula: $\mathrm{Ro}'(\mathbf{x}_i, i\theta) = \mathrm{Ro}(\mathbf{x}_i, i\theta')$ This principle demonstrates that adapting RoPE for different sequence lengths is achieved by adjusting the rotation angles applied to the embeddings.