Learn Before
Formula for Attention Score with ALiBi Bias
The ALiBi method modifies the standard attention score by adding a positional bias term, , directly to the scaled query-key dot product. This integration of the linear bias into the attention calculation results in the following formula for the pre-Softmax score:
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Attention Score with ALiBi Bias
Linear Relative Position Bias Example
In a sequence processing model, a positional bias is calculated to penalize attention scores based on the distance between tokens. The formula used is
Bias = -β ⋅ (i - j), whereiis the query position,jis the key position, andβis a fixed scalar. If the query token is at position 5, the key token is at position 2, andβ = 0.1, what is the calculated bias value?Visual Example of a Linear Relative Position Bias in Causal Attention
True or False: According to the positional bias formula
PE(i, j) = -β ⋅ (i - j), whereiis the query position,jis the key position, andβis a positive scalar, the penalty applied to the attention score decreases as the distance between the query and key tokens increases.Interpreting a Linear Positional Bias Value
Similarity of ALiBi Positional Biases to Length Features
Learn After
Complete ALiBi Attention Formula
Calculating a Pre-Softmax Attention Score with Linear Bias
In a model that adds a linear positional bias to its attention calculation, a query at position
i=10attends to two keys at positionsj1=5andj2=2. Assuming the scaled dot-product portion of the score is identical for both keys, how will the addition of the positional bias termPE(i, j)affect their final pre-Softmax attention scores?Interaction of Semantic and Positional Scores