1Cademy - Comparing Positional Information Mechanisms

Learn Before

Example of RoPE Capturing Relative Positional Information

Case Study

Comparing Positional Information Mechanisms

Based on the fundamental difference in how these models handle positional information, explain why Model Y's rotational approach gives it a performance advantage over Model X's additive approach in this specific scenario.

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A language model using a rotational mechanism for positional information processes two different sentences:

Sentence A: "...the powerful model at position 4 now predicts at position 6..." Sentence B: "...we see the powerful model at position 15 now predicts at position 17..."

In both sentences, the relative distance between 'model' and 'predicts' is 2. Based on the principles of this rotational encoding method, what is the most accurate conclusion about the vector embeddings for 'model' and 'predicts'?
Comparing Positional Information Mechanisms
Consider a language model that encodes positional information by rotating token embeddings. In this model, the final vector for the word 'cat' at position 2 in a sentence will be identical to the final vector for the word 'cat' at position 9 in a different sentence.

Learn Before

Related