Comparing Positional Information Mechanisms
Based on the fundamental difference in how these models handle positional information, explain why Model Y's rotational approach gives it a performance advantage over Model X's additive approach in this specific scenario.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model using a rotational mechanism for positional information processes two different sentences:
Sentence A: "...the powerful model at position 4 now predicts at position 6..." Sentence B: "...we see the powerful model at position 15 now predicts at position 17..."
In both sentences, the relative distance between 'model' and 'predicts' is 2. Based on the principles of this rotational encoding method, what is the most accurate conclusion about the vector embeddings for 'model' and 'predicts'?
Comparing Positional Information Mechanisms
Consider a language model that encodes positional information by rotating token embeddings. In this model, the final vector for the word 'cat' at position 2 in a sentence will be identical to the final vector for the word 'cat' at position 9 in a different sentence.