Learn Before
ALiBi Bias Term Formula
In Attention with Linear Biases (ALiBi), the positional bias term is calculated as the negative scaled difference between the query position and the key position. Specifically, for a query at position and a key at position , the positional embedding bias is defined by the equation . This can be equivalently formulated by distributing the negative sign to yield , where is a scaling factor.

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
ALiBi Bias Term Formula
Imagine a self-attention mechanism where a modification adds a penalty to the attention score between any two words. This penalty is designed to increase in a straight, consistent line as the distance between the words' positions in the sequence grows. What is the most likely behavioral outcome of this modification?
In a self-attention mechanism that incorporates a linear bias based on token distance, the bias term added to the attention score is a positive value that decreases as the distance between the query and key increases.
Effect of Distance-Based Attention Penalty
Learn After
Formula for Attention Score with ALiBi Bias
Linear Relative Position Bias Example
In a sequence processing model, a positional bias is calculated to penalize attention scores based on the distance between tokens. The formula used is
Bias = -β ⋅ (i - j), whereiis the query position,jis the key position, andβis a fixed scalar. If the query token is at position 5, the key token is at position 2, andβ = 0.1, what is the calculated bias value?Visual Example of a Linear Relative Position Bias in Causal Attention
True or False: According to the positional bias formula
PE(i, j) = -β ⋅ (i - j), whereiis the query position,jis the key position, andβis a positive scalar, the penalty applied to the attention score decreases as the distance between the query and key tokens increases.Interpreting a Linear Positional Bias Value
Similarity of ALiBi Positional Biases to Length Features