1Cademy - In a transformer architecture that uses a bucketed approach for relative positional information, the scalar bias associated with each bucket is determined by a predefined, non-trainable mathematical formula.

Learn Before

Learned Parameters for T5 Bias

True/False

In a transformer architecture that uses a bucketed approach for relative positional information, the scalar bias associated with each bucket is determined by a predefined, non-trainable mathematical formula.

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Comprehension in Revised Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

In a specific attention mechanism, the relative distance between any two positions in a sequence is mapped to one of a fixed number of 'buckets'. Each bucket has a single, corresponding scalar bias value that is added to the attention logits. Considering how such a model adapts to data, which statement best describes how the specific scalar bias value for each bucket is determined?
Designing a Relative Positional Bias Scheme
In a transformer architecture that uses a bucketed approach for relative positional information, the scalar bias associated with each bucket is determined by a predefined, non-trainable mathematical formula.

Learn Before

Related