Learn Before
Multiple Choice

A developer is implementing a relative position bias mechanism where query-key offsets are grouped into a limited number of 'buckets', with each bucket sharing a single learnable parameter. They use a hyperparameter, n_b, as the basis for determining the number of buckets. Their code allocates an array of size n_b to store these learnable parameters. Based on the typical structure of this mechanism, what is the fundamental flaw in this approach?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science