Learn Before
Formula

Unified Formula for T5 Bias Bucketing

The T5 relative position bias mechanism employs a single piecewise function to assign any relative position offset, iji - j, to a specific bucket index, b(ij)b(i - j). This function unites a direct, one-to-one mapping for smaller offsets with a logarithmic grouping for larger offsets. The complete expression is:

b(ij)={ij0ij<nb+12min(nb,nb+12+log(ij)log(nb+12)log(distmax)log(nb+12)nb+12)ijnb+12b(i - j) = \begin{cases} i - j & 0 \le i - j < \frac{n_b + 1}{2} \\ \min\left(n_b, \frac{n_b + 1}{2} + \left\lfloor \frac{\log(i - j) - \log\left(\frac{n_b + 1}{2}\right)}{\log(\mathrm{dist}_{\mathrm{max}}) - \log\left(\frac{n_b + 1}{2}\right)} \cdot \frac{n_b + 1}{2} \right\rfloor\right) & i - j \ge \frac{n_b + 1}{2} \end{cases}

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences