Learn Before
Analyzing the Challenge of Consecutive Masking
A language model is pre-trained using a masked language modeling objective. During one training step, it sees the input: The cat sat on the [MASK] [MASK]. Explain why predicting the two masked tokens in this scenario is a more challenging task for the model than predicting two separate, non-adjacent masked tokens in a longer sentence (e.g., The [MASK] cat sat on the warm [MASK].).
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Denoising Task with Consecutive Token Masking
Representing Masked Spans with Sentinel Tokens
A language model is being trained to predict masked words in a text. Consider two different masking strategies:
Strategy 1: 15% of the words in a sentence are masked individually at random positions. Example:
The quick [MASK] fox jumps [MASK] the lazy dog.Strategy 2: A contiguous span of several words is masked. Example:
The quick [MASK] [MASK] [MASK] the lazy dog.How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?
Analyzing a Masked Language Modeling Task
Analyzing Model Performance Discrepancy
Analyzing the Challenge of Consecutive Masking