Learn Before
Diagnosing Model Generalization Failure
Based on the provided scenario, identify the most likely architectural reason for the model's sharp performance decline on longer documents and explain your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is trained exclusively on text sequences with a maximum length of 1024 tokens. Its design includes a component where a unique, learnable numerical bias is assigned to every possible relative distance between token pairs (e.g., a specific bias for a distance of 1, another for a distance of 2, up to the maximum possible distance in the training data). What is the most likely outcome when this model is later tasked with processing a document of 1500 tokens?
Critique of a Relative Positional Bias Method
Diagnosing Model Generalization Failure