1Cademy - Rationale for Geometric Progression in Positional Bias

Learn Before

Geometric Progression for ALiBi's $\beta$ Scalar per Head

Short Answer

Rationale for Geometric Progression in Positional Bias

A common and effective heuristic for setting the positional bias scalar in a multi-head attention layer is to assign a unique, decreasing value to each head, such that the values form a geometric progression. Explain the primary reasoning behind why this approach is considered a robust strategy for model configuration.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related