Critique of a Pre-training Task Design
A research team is designing a pre-training task for a language model to help it understand discourse coherence. The task involves presenting the model with two sentences and having it predict whether the second sentence immediately follows the first. For negative examples (non-consecutive sentences), the team pairs a sentence from one document with a sentence from a completely unrelated document (e.g., one from a history text and one from a biology text). Analyze why this design might fail to teach the model a sophisticated understanding of sentence-to-sentence coherence. Specifically, explain how the relative ease of this task could encourage the model to adopt a simplistic learning strategy.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is pre-trained on a task where it must determine if two sentences are consecutive in a text. When presented with the pair: 'The children went to the river bank to skip stones.' and 'The First National Bank offers competitive loan rates.', the model incorrectly classifies them as consecutive. Which of the following best explains why the model made this specific error?
Diagnosing Flawed Model Behavior
Critique of a Pre-training Task Design