1Cademy - One-to-One Mapping for Initial T5 Bias Buckets

Learn Before

Number of Buckets for T5 Bias Terms

Concept

One-to-One Mapping for Initial T5 Bias Buckets

In the T5 relative positional encoding scheme, the initial range of buckets maintains a direct, one-to-one correspondence with the query-key offsets. Specifically, for buckets indexed from ${}0$ up to $\frac{n_b + 1}{2} - 1$ , each bucket is assigned to a single unique offset (i.e., bucket ${}0$ matches offset ${}0$ , bucket ${}1$ matches offset ${}1$ , and so forth). This direct mapping is mathematically denoted by the function $b(i - j) = i - j$ .

Updated 2026-04-23

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Formula for One-to-One Mapping in T5 Bias Bucketing
In a transformer model that uses a relative position bias mechanism, a specific set of initial 'buckets' is used to store shared bias parameters. For small, non-negative relative distances between a query and a key, there is a direct correspondence where the bucket index is identical to the distance. If a query is at position 8 and a key is at position 5, what is the index of the bucket used for their interaction?
Consider a transformer model's attention mechanism that uses a set of 'buckets' to store shared parameters for relative positions. For small, non-negative distances between a query and a key, a direct one-to-one correspondence is used where the bucket index is identical to the distance. Based on this rule, an interaction between a query at position 5 and a key at position 2 would be assigned to bucket index 3.
In a specific attention mechanism, shared parameters for interactions between tokens are stored in 'buckets' based on the distance between them. For the first several buckets, a simple rule applies: the bucket index is identical to the distance. If the distance between two tokens is 4, the interaction parameter will be retrieved from bucket number ____.

Learn Before

Related

Learn After