Concept

Final Bucket for Offsets Exceeding dist_max in T5 Bias

In the T5 relative position bucketing system, when a query-key offset iji - j is strictly greater than the maximum expected distance, distmax\mathrm{dist}_{\mathrm{max}}, it is placed into the very last bucket. This means that bucket nbn_b acts as a final container that holds all the remaining offsets that were not assigned to any of the previous one-to-one or logarithmic buckets. By capturing these uncovered offsets, this final bucket is specifically designed to enable the model to handle sequences of arbitrarily long lengths.

0

1

Updated 2026-04-24

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences