1Cademy - Geometric Progression for ALiBis $$\beta$$ Scalar per Head

Learn Before

Scalar per Head in Multi-Head Attention
Tuning the ALiBi Bias Scalar ( $\beta$ )

Concept

Geometric Progression for ALiBi's $\beta$ Scalar per Head

While the ALiBi bias scalar $\beta$ can be tuned, research shows that an effective alternative for multi-head attention is to set $\beta$ to values that decrease geometrically by a factor of $\frac{1}{2^{a}}$ across the heads. This heuristic strategy performs well on a variety of tasks without the need for individual tuning on a validation dataset.