Learn Before
Positional Invariance in Self-Attention
Consider two identical phrases, 'the quick brown fox', appearing at the beginning of one document and in the middle of another. A self-attention mechanism is processing the relationship between the words 'quick' and 'fox' in both instances. Explain why a model using relative positional representations would compute a more consistent attention score for this word pair across the two documents compared to a model using representations based on a token's fixed index from the start of the sequence.
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is trained exclusively on text sequences with a maximum length of 512 tokens. During evaluation, the model shows a significant drop in performance when processing documents that are 1000 tokens long. The engineers hypothesize the problem is related to how the model incorporates word order information. Which of the following changes to the model's architecture is most likely to resolve this specific issue?
Positional Encoding for Machine Translation
Positional Invariance in Self-Attention
Mechanism of Relative Positional Embedding