1Cademy - Visualization of T5 Bias Bucketing

Learn Before

Unified Formula for T5 Bias Bucketing

Example

Visualization of T5 Bias Bucketing

An illustration of the T5 model's bucketing mechanism visualizes the piecewise distribution of query-key offsets into learnable parameter groups. In a configuration with $n_b = 32$ buckets and a maximum distance of $\mathrm{dist}_{\mathrm{max}} = 1024$ , the first half of the buckets use a fixed size to maintain a one-to-one mapping for small offset distances. In the second half, the bucket capacity increases logarithmically to group larger offsets together. The final bucket serves as a catch-all for any offsets that exceed the preceding ranges, enabling the model to process sequences of arbitrary length.

Updated 2026-04-24

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn Before

Related