1Cademy - A developer is implementing a relative position bias mechanism where query-key offsets are grouped into a limited number of buckets, with each bucket sharing a single learnable parameter. They use a hyperparameter, `n_b`, as the basis for determining the number of buckets. Their code allocates an array of size `n_b` to store these learnable parameters. Based on the typical structure of this mechanism, what is the fundamental flaw in this approach?

Learn Before

Number of Buckets for T5 Bias Terms

Multiple Choice

A developer is implementing a relative position bias mechanism where query-key offsets are grouped into a limited number of 'buckets', with each bucket sharing a single learnable parameter. They use a hyperparameter, n_b, as the basis for determining the number of buckets. Their code allocates an array of size n_b to store these learnable parameters. Based on the typical structure of this mechanism, what is the fundamental flaw in this approach?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related