Learn Before
Critique of a Relative Positional Bias Method
A language model's architecture incorporates a mechanism where a unique, learnable value is assigned to every possible relative distance between any two tokens. For example, a specific value is learned for a distance of 5, another for a distance of 6, and so on, up to the maximum distance encountered in the training data. Evaluate the suitability of this design for a model intended to process documents of highly variable and potentially unbounded lengths. Justify your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is trained exclusively on text sequences with a maximum length of 1024 tokens. Its design includes a component where a unique, learnable numerical bias is assigned to every possible relative distance between token pairs (e.g., a specific bias for a distance of 1, another for a distance of 2, up to the maximum possible distance in the training data). What is the most likely outcome when this model is later tasked with processing a document of 1500 tokens?
Critique of a Relative Positional Bias Method
Diagnosing Model Generalization Failure