Diagnosing a Language Model's Flawed Coherence Judgment
Based on the description of the training process and the observed failure in the case study below, what is the most likely reason for the model's poor performance on the downstream task? Explain how the training setup encouraged this specific type of error.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained on a task where it must determine if Sentence B is the actual sentence that follows Sentence A in a document. Which of the following training pairs is most likely to encourage the model to learn a simple, superficial shortcut for this task, rather than developing a deeper understanding of semantic coherence?
Simplicity of NSP Task as a Cause for Reliance on Superficial Cues
Diagnosing a Language Model's Flawed Coherence Judgment
Unintended Learning in Sentence Relationship Models