Learn Before
Similarity of ALiBi Positional Biases to Length Features
The functional form of the right-hand side of the ALiBi (Attention with Linear Biases) equation is very similar to length features utilized in conventional feature-based systems. For instance, in statistical machine translation systems, such length features are extensively used to model word reordering problems, resulting in models that can generalize well across different translation tasks.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Formula for Attention Score with ALiBi Bias
Linear Relative Position Bias Example
In a sequence processing model, a positional bias is calculated to penalize attention scores based on the distance between tokens. The formula used is
Bias = -β ⋅ (i - j), whereiis the query position,jis the key position, andβis a fixed scalar. If the query token is at position 5, the key token is at position 2, andβ = 0.1, what is the calculated bias value?Visual Example of a Linear Relative Position Bias in Causal Attention
True or False: According to the positional bias formula
PE(i, j) = -β ⋅ (i - j), whereiis the query position,jis the key position, andβis a positive scalar, the penalty applied to the attention score decreases as the distance between the query and key tokens increases.Interpreting a Linear Positional Bias Value
Similarity of ALiBi Positional Biases to Length Features