Consider a language model that encodes positional information by rotating token embeddings. In this model, the final vector for the word 'cat' at position 2 in a sentence will be identical to the final vector for the word 'cat' at position 9 in a different sentence.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model using a rotational mechanism for positional information processes two different sentences:
Sentence A: "...the powerful model at position 4 now predicts at position 6..." Sentence B: "...we see the powerful model at position 15 now predicts at position 17..."
In both sentences, the relative distance between 'model' and 'predicts' is 2. Based on the principles of this rotational encoding method, what is the most accurate conclusion about the vector embeddings for 'model' and 'predicts'?
Comparing Positional Information Mechanisms
Consider a language model that encodes positional information by rotating token embeddings. In this model, the final vector for the word 'cat' at position 2 in a sentence will be identical to the final vector for the word 'cat' at position 9 in a different sentence.