Concept

Number of Buckets for T5 Bias Terms

In the T5 relative position bias implementation, the learnable bias parameters are associated with a set of nb+1n_b + 1 distinct "buckets." This structure groups various query-key offsets together, with all relative position encodings, PE(i,j)\mathrm{PE}(i,j), that fall into the same bucket sharing the exact same bias term, denoted as ub(ij)u_{b(i-j)}.

Image 0

0

1

Updated 2026-04-24

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related