1Cademy - Complete ALiBi Attention Formula

Learn Before

Formula for Attention Score with ALiBi Bias

Formula

Complete ALiBi Attention Formula

The final attention weight in the ALiBi framework, denoted as $\alpha(i, j)$ , is computed by applying the Softmax function to the attention score. This score is derived by adding the ALiBi positional bias term, $\beta \cdot (j - i)$ , to the standard query-key product $\mathbf{q}_i \mathbf{k}_j^{\mathrm{T}}$ , scaling the sum by the inverse square root of the dimension $d$ , and incorporating an optional mask. The complete equation is expressed as: $\alpha(i,j) = \mathrm{Softmax}\left(\frac{\mathbf{q}_i \mathbf{k}_{j}^{\mathrm{T}} + \beta \cdot (j - i)}{\sqrt{d}} + \mathrm{Mask}(i,j)\right)$ In this formula, $\mathbf{q}_i$ and $\mathbf{k}_j$ denote the query and key vectors, and $\beta$ acts as a scaling factor. The $\mathrm{Mask}(i, j)$ term ensures proper attention masking when required.

0

1

Updated 2026-04-24

Contributors are:

Who are from:

References

Learn Before

Related

Learn After