1Cademy - Analyzing a Masked Language Modeling Task

Learn Before

Consecutive Token Masking in MLM

Short Answer

Analyzing a Masked Language Modeling Task

A language model is being trained with the following example:

Input: The cat [MASK] [MASK] on the mat. Target: The cat sat lazily on the mat.

Explain why predicting the second [MASK] token (which corresponds to 'lazily') is a more difficult task for the model in this specific scenario than if the input had been The cat sat [MASK] on the mat.

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Example of Denoising Task with Consecutive Token Masking
Representing Masked Spans with Sentinel Tokens
A language model is being trained to predict masked words in a text. Consider two different masking strategies:

Strategy 1: 15% of the words in a sentence are masked individually at random positions. Example: The quick [MASK] fox jumps [MASK] the lazy dog.

Strategy 2: A contiguous span of several words is masked. Example: The quick [MASK] [MASK] [MASK] the lazy dog.

How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?
Analyzing a Masked Language Modeling Task
Analyzing Model Performance Discrepancy
Analyzing the Challenge of Consecutive Masking

Learn Before

Related