Example

Visualization of T5 Bias Bucketing

An illustration of the T5 model's bucketing mechanism visualizes the piecewise distribution of query-key offsets into learnable parameter groups. In a configuration with nb=32n_b = 32 buckets and a maximum distance of distmax=1024\mathrm{dist}_{\mathrm{max}} = 1024, the first half of the buckets use a fixed size to maintain a one-to-one mapping for small offset distances. In the second half, the bucket capacity increases logarithmically to group larger offsets together. The final bucket serves as a catch-all for any offsets that exceed the preceding ranges, enabling the model to process sequences of arbitrary length.

Image 0

0

1

Updated 2026-04-24

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences