Learn Before
Short Answer

Positional Invariance in Self-Attention

Consider two identical phrases, 'the quick brown fox', appearing at the beginning of one document and in the middle of another. A self-attention mechanism is processing the relationship between the words 'quick' and 'fox' in both instances. Explain why a model using relative positional representations would compute a more consistent attention score for this word pair across the two documents compared to a model using representations based on a token's fixed index from the start of the sequence.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science