Learn Before
Relative Position Offset Calculation
The relative position offset between a query at index and a key at index is calculated as the simple difference of their indices. This value, often denoted as , quantifies the distance and direction between two tokens in a sequence and is computed using the formula:
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Interpretation of Positional Bias as a Distance Penalty
T5 Bias for Relative Positional Embedding
Shared Learnable Bias per Offset
Heuristic-Based Relative Positional Biases
Comparison of Learned vs. Heuristic-Based Relative Positional Biases
Kerple
FIRE
Relative Position Offset Calculation
A self-attention model incorporates positional awareness by adding a bias term directly to the query-key dot product for each pair of positions
(i, j). This bias term's value depends on the relative distance betweeniandj. What is the primary implication of this approach compared to the alternative of adding positional vectors to the input token embeddings?Incorporating Positional Bias into Attention Scores
In a self-attention mechanism, the score computed between a query at position
iand a key at positionjis modified by directly adding a bias term whose value depends only on the positionsiandj. What is the primary function of this bias term within the attention calculation?
Learn After
Shared Learnable Bias per Offset
In a self-attention mechanism that uses relative positioning, consider a sequence of tokens where the model is calculating the attention score. If the current query token is at index 8 and the key token being attended to is at index 5, what is the calculated offset that represents their relative position?
A self-attention model calculates the relative position offset between a query at index
iand a key at indexjusing the formula:offset = i - j. Based on this formula, which of the following conclusions is correct?In a sequence of tokens, the relative position offset between a query at index
iand a key at indexjis calculated asi - j. If the query's positioniis held constant while the key's positionjincreases (i.e., the key token appears later in the sequence), how does the calculated offset change?