Formula

Formula for Logarithmic Bucketing in T5 Bias

In the T5 bias mechanism, the bucket number b(ij)b(i-j) for a large offset, where ijnb+12i - j \ge \frac{n_b+1}{2}, is calculated using a logarithmic scale. The formula is defined as:

b(ij)=nb+12+log(ij)log(nb+12)log(distmax)log(nb+12)nb+12b(i - j) = \frac{n_b+1}{2} + \left\lfloor \frac{\log(i - j) - \log(\frac{n_b+1}{2})}{\log(\mathrm{dist}_{\mathrm{max}}) - \log(\frac{n_b+1}{2})} \cdot \frac{n_b+1}{2} \right\rfloor

This equation maps the offset to a bucket index by normalizing its logarithmic position relative to a maximum distance, distmax\mathrm{dist}_{\mathrm{max}}, and scaling it to the available number of logarithmic buckets.

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences