Learn Before
A language model is being trained to predict masked words in a text. Consider two different masking strategies:
Strategy 1: 15% of the words in a sentence are masked individually at random positions.
Example: The quick [MASK] fox jumps [MASK] the lazy dog.
Strategy 2: A contiguous span of several words is masked.
Example: The quick [MASK] [MASK] [MASK] the lazy dog.
How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Denoising Task with Consecutive Token Masking
Representing Masked Spans with Sentinel Tokens
A language model is being trained to predict masked words in a text. Consider two different masking strategies:
Strategy 1: 15% of the words in a sentence are masked individually at random positions. Example:
The quick [MASK] fox jumps [MASK] the lazy dog.Strategy 2: A contiguous span of several words is masked. Example:
The quick [MASK] [MASK] [MASK] the lazy dog.How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?
Analyzing a Masked Language Modeling Task
Analyzing Model Performance Discrepancy
Analyzing the Challenge of Consecutive Masking