Learn Before
In a transformer architecture that uses a bucketed approach for relative positional information, the scalar bias associated with each bucket is determined by a predefined, non-trainable mathematical formula.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a specific attention mechanism, the relative distance between any two positions in a sequence is mapped to one of a fixed number of 'buckets'. Each bucket has a single, corresponding scalar bias value that is added to the attention logits. Considering how such a model adapts to data, which statement best describes how the specific scalar bias value for each bucket is determined?
Designing a Relative Positional Bias Scheme
In a transformer architecture that uses a bucketed approach for relative positional information, the scalar bias associated with each bucket is determined by a predefined, non-trainable mathematical formula.