Learn Before
Example of RoPE Capturing Relative Positional Information
Rotary Positional Embeddings (RoPE) are designed to capture the relative positions of tokens, a concept illustrated by the word pair 'cat' and 'sleeping' appearing in different sentences. In the first sentence, 'The₁ cat₂ is₃ sleeping₄ peacefully₅...', 'cat' is at position 2 and 'sleeping' is at position 4. In the second sentence, '...the₈ cat₉ is₁₀ sleeping₁₁ on₁₂...', the words are at positions 9 and 11. Although their absolute positions have changed, the relative distance between them remains 2. RoPE's rotational mechanism ensures that the angular relationship between the vector embeddings for 'cat' and 'sleeping' is determined solely by this constant relative distance, allowing the model to generalize relationships irrespective of their absolute location in a sequence.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Rotary and Sinusoidal Embeddings
Conceptual Illustration of RoPE's Rotational Mechanism
Example of RoPE Capturing Relative Positional Information
Application of RoPE to d-dimensional Embeddings
Application of RoPE to Token Embeddings
RoPE as a Linear Combination of Periodic Functions
Consider two distinct methods for encoding a token's position within a sequence. Method A calculates a unique positional vector and adds it to the token's embedding. Method B applies a rotational transformation to the token's embedding, with the angle of rotation determined by the token's position. Based on these descriptions, which statement best analyzes a fundamental difference in how these two methods integrate positional context?
Positional Information in Vector Transformations
Analyzing Relative Positional Information
Selecting a Positional Strategy for a Long-Context Retrofit
Diagnosing Long-Context Failures Across Positional Schemes
Choosing and Justifying a Positional Retrofit Under Long-Context and Latency Constraints
Long-Context Retrofit Decision: RoPE Base Scaling vs ALiBi vs T5 Relative Bias
Post-Retrofit Regression: Separating Positional-Method Effects from Scaling Choices
Root-Cause Analysis of Long-Context Degradation After a Positional-Encoding Retrofit
You are reviewing a proposal to extend a productio...
You’re reviewing three proposed positional mechani...
Your team is extending a pretrained Transformer fr...
You’re debugging a long-context retrofit of a pret...
Advantage of Rotary over Sinusoidal Embeddings for Long Sequences
Formula for Multiplicative Positional Embeddings
Angle Preservation in Rotary Embeddings
Learn After
A language model using a rotational mechanism for positional information processes two different sentences:
Sentence A: "...the powerful model at position 4 now predicts at position 6..." Sentence B: "...we see the powerful model at position 15 now predicts at position 17..."
In both sentences, the relative distance between 'model' and 'predicts' is 2. Based on the principles of this rotational encoding method, what is the most accurate conclusion about the vector embeddings for 'model' and 'predicts'?
Comparing Positional Information Mechanisms
Consider a language model that encodes positional information by rotating token embeddings. In this model, the final vector for the word 'cat' at position 2 in a sentence will be identical to the final vector for the word 'cat' at position 9 in a different sentence.