1Cademy - Consider a transformer models attention mechanism that uses a set of buckets to store shared parameters for relative positions. For small, non-negative distances between a query and a key, a direct one-to-one correspondence is used where the bucket index is identical to the distance. Based on this rule, an interaction between a query at position 5 and a key at position 2 would be assigned to bucket index 3.

Learn Before

One-to-One Mapping for Initial T5 Bias Buckets

True/False

Consider a transformer model's attention mechanism that uses a set of 'buckets' to store shared parameters for relative positions. For small, non-negative distances between a query and a key, a direct one-to-one correspondence is used where the bucket index is identical to the distance. Based on this rule, an interaction between a query at position 5 and a key at position 2 would be assigned to bucket index 3.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related