Learn Before
Analyzing Model Performance Discrepancy
Based on the training process described in the case study, what is the most likely reason for the model's poor performance on inputs with multiple adjacent [MASK] tokens? Explain the connection between the training data and the observed performance gap.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Denoising Task with Consecutive Token Masking
Representing Masked Spans with Sentinel Tokens
A language model is being trained to predict masked words in a text. Consider two different masking strategies:
Strategy 1: 15% of the words in a sentence are masked individually at random positions. Example:
The quick [MASK] fox jumps [MASK] the lazy dog.Strategy 2: A contiguous span of several words is masked. Example:
The quick [MASK] [MASK] [MASK] the lazy dog.How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?
Analyzing a Masked Language Modeling Task
Analyzing Model Performance Discrepancy
Analyzing the Challenge of Consecutive Masking