Learn Before
Designing a Relative Positional Bias Scheme
Evaluate the two strategies presented in the case study. Which strategy allows the model to more flexibly adapt to the specific relational patterns in the training data, and why?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a specific attention mechanism, the relative distance between any two positions in a sequence is mapped to one of a fixed number of 'buckets'. Each bucket has a single, corresponding scalar bias value that is added to the attention logits. Considering how such a model adapts to data, which statement best describes how the specific scalar bias value for each bucket is determined?
Designing a Relative Positional Bias Scheme
In a transformer architecture that uses a bucketed approach for relative positional information, the scalar bias associated with each bucket is determined by a predefined, non-trainable mathematical formula.