1Cademy - Formula for Applying T5 Relative Position Bias

Learn Before

Synthesis of T5 Bias Bucketing Rules

Formula

Formula for Applying T5 Relative Position Bias

The T5 relative position bias is incorporated directly into the attention score calculation. A learnable scalar bias, denoted as $u_{b(i-j)}$ , is added to the query-key dot product. This sum is then scaled by dividing by the square root of the key dimension, $\sqrt{d}$ , before the Softmax function is applied. The specific bias value is determined by the bucket $b(i-j)$ that corresponds to the relative offset between the query at position $i$ and the key at position $j$ . The complete formula for the attention score $\alpha(i, j)$ is: $\alpha(i,j) = \mathrm{Softmax}\left(\frac{\mathbf{q}_i \mathbf{k}_{j}^{\mathrm{T}} + u_{b(i-j)}}{\sqrt{d}} + \mathrm{Mask}(i,j)\right)$ where $\mathrm{Mask}(i, j)$ is the attention mask.

0

1

Updated 2026-04-24

Contributors are:

Who are from:

References

Learn Before

Related

Learn After