Geometric Progression Formula for ALiBi's β Scalar per Head
When employing a geometric progression to determine the scalar bias (β) for each head in an ALiBi (Attention with Linear Biases) mechanism, the specific value for the k-th head is calculated using the formula:

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Geometric Progression Formula for ALiBi's β Scalar per Head
Evaluating Strategies for Setting Positional Bias Scalars
An engineer is configuring a multi-head attention layer with 8 heads that uses a linear positional bias. Instead of tuning a separate bias scalar (β) for each head, they set the values to form a decreasing geometric sequence (e.g., Head 1 β=0.5, Head 2 β=0.25, Head 3 β=0.125, and so on). What is the primary advantage of this configuration strategy?
Rationale for Geometric Progression in Positional Bias
Learn After
In a multi-head attention model, the linear bias scalar for each attention head is determined by a geometric progression. The value for the k-th head is calculated using the formula: β_k = 1 / (2^(8/k)). For a model with at least 4 heads, what is the calculated bias scalar for the 4th head (i.e., when k=4)?
In a multi-head attention mechanism, a scalar bias for each head is determined by a geometric progression. The value for the k-th head is calculated using the formula: β_k = 1 / (2^(8/k)). How does the value of this scalar bias change as the head index
kincreases?Selecting an Appropriate Bias Scalar Formula