Learn Before
Formula

Formula for Applying T5 Relative Position Bias

The T5 relative position bias is incorporated directly into the attention score calculation. A learnable scalar bias, denoted as ub(ij)u_{b(i-j)}, is added to the query-key dot product. This sum is then scaled by dividing by the square root of the key dimension, d\sqrt{d}, before the Softmax function is applied. The specific bias value is determined by the bucket b(ij)b(i-j) that corresponds to the relative offset between the query at position ii and the key at position jj. The complete formula for the attention score α(i,j)\alpha(i, j) is: α(i,j)=Softmax(qikjT+ub(ij)d+Mask(i,j))\alpha(i,j) = \mathrm{Softmax}\left(\frac{\mathbf{q}_i \mathbf{k}_{j}^{\mathrm{T}} + u_{b(i-j)}}{\sqrt{d}} + \mathrm{Mask}(i,j)\right) where Mask(i,j)\mathrm{Mask}(i, j) is the attention mask.

0

1

Updated 2026-04-24

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences