Analyzing a Flawed Pre-training Strategy
A data scientist is pre-training an encoder-decoder model on a large text corpus. For each document, they create a training example by selecting a single, random sentence as the input for the encoder and the immediately following sentence as the target for the decoder. After extensive training, they observe that the model is very good at generating a plausible next sentence, but it fails to generate long, coherent multi-paragraph continuations that rely on the broader context of the original document. Based on the principles of this training approach, explain the most likely flaw in the data preparation strategy that is causing this specific performance issue.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team aims to pre-train a sequence-to-sequence model for various text generation tasks using a massive, unlabeled text corpus. Their proposed training strategy is as follows: for each document, they will randomly split it into an initial segment and a concluding segment. The model's encoder will process the entire initial segment at once to form a contextual understanding. The model will then be trained to use its decoder to generate the concluding segment, conditioned on the encoder's output. Which of the following statements provides the most accurate evaluation of this strategy for the team's objective?
Comparison of Prefix Language Modeling and Causal Language Modeling
You are preparing a single training example for an encoder-decoder model using a self-supervised objective on a large, unlabeled text document. Arrange the following actions into the correct chronological sequence for one complete training step.
Analyzing a Flawed Pre-training Strategy