Benefit of Consecutive Token Masking
A language model is trained using a method where it receives a sentence with a sequence of several adjacent words replaced by a special marker (e.g., The team celebrated their [MASK] [MASK] [MASK] .). The model's task is to generate the original, complete sentence (e.g., The team celebrated their hard-earned victory .). Beyond simply learning vocabulary, what specific capability related to language structure does this training method help the model develop?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained with pairs of text sequences. Consider the following training example:
Input:The committee reviewed the [MASK] [MASK] [MASK] and approved it.
Target Output:The committee reviewed the detailed project proposal and approved it.
Based on this training example, what is the primary learning objective for the model?Improving Phrase Generation in a Language Model
Benefit of Consecutive Token Masking