Concept

Geometric Progression for ALiBi's β\beta Scalar per Head

While the ALiBi bias scalar β\beta can be tuned, research shows that an effective alternative for multi-head attention is to set β\beta to values that decrease geometrically by a factor of 12a\frac{1}{2^{a}} across the heads. This heuristic strategy performs well on a variety of tasks without the need for individual tuning on a validation dataset.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related