1Cademy - Logarithmic Bucketing for Larger T5 Offsets

Learn Before

Number of Buckets for T5 Bias Terms

Concept

Logarithmic Bucketing for Larger T5 Offsets

Within the T5 relative bias framework, relative position offsets that exceed the one-to-one mapping threshold are grouped into buckets that grow logarithmically in size. Specifically, for the remaining buckets indexed from $\frac{n_b + 1}{2}$ up to $n_b$ , each bucket encompasses a logarithmically increasing range of offsets. This bucketing strategy enables the architecture to handle extensive sequences by generalizing to larger distances without dedicating a unique parameter to every single offset.