Learn Before
In a sequence of tokens, the relative position offset between a query at index i and a key at index j is calculated as i - j. If the query's position i is held constant while the key's position j increases (i.e., the key token appears later in the sequence), how does the calculated offset change?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Shared Learnable Bias per Offset
In a self-attention mechanism that uses relative positioning, consider a sequence of tokens where the model is calculating the attention score. If the current query token is at index 8 and the key token being attended to is at index 5, what is the calculated offset that represents their relative position?
A self-attention model calculates the relative position offset between a query at index
iand a key at indexjusing the formula:offset = i - j. Based on this formula, which of the following conclusions is correct?In a sequence of tokens, the relative position offset between a query at index
iand a key at indexjis calculated asi - j. If the query's positioniis held constant while the key's positionjincreases (i.e., the key token appears later in the sequence), how does the calculated offset change?