T5 Bias for Relative Positional Embedding
The T5 bias, introduced by Raffel et al. (2020), is an advanced approach that generalizes the concept of offset-specific biases. To address the generalization problem of assigning a unique parameter to every offset, T5 groups various query-key offsets into a limited number of 'buckets.' Each bucket is then associated with a single, shared learnable parameter, enabling the model to handle a wide range of relative positions, including those not seen during training.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Interpretation of Positional Bias as a Distance Penalty
T5 Bias for Relative Positional Embedding
Shared Learnable Bias per Offset
Heuristic-Based Relative Positional Biases
Comparison of Learned vs. Heuristic-Based Relative Positional Biases
Kerple
FIRE
Relative Position Offset Calculation
A self-attention model incorporates positional awareness by adding a bias term directly to the query-key dot product for each pair of positions
(i, j). This bias term's value depends on the relative distance betweeniandj. What is the primary implication of this approach compared to the alternative of adding positional vectors to the input token embeddings?Incorporating Positional Bias into Attention Scores
In a self-attention mechanism, the score computed between a query at position
iand a key at positionjis modified by directly adding a bias term whose value depends only on the positionsiandj. What is the primary function of this bias term within the attention calculation?Generalization Limit of Offset-Specific Biases
Calculating Positional Bias from Offset
In a self-attention mechanism that uses a shared, learnable parameter for each unique relative position offset, which of the following query-key pairs will share the exact same positional bias parameter as the pair with a query at position 8 and a key at position 3?
T5 Bias for Relative Positional Embedding
Parameter Implications of Offset-Based Positional Bias