Learn Before
Multiple Choice

A language model is being trained to predict masked words in a text. Consider two different masking strategies:

Strategy 1: 15% of the words in a sentence are masked individually at random positions. Example: The quick [MASK] fox jumps [MASK] the lazy dog.

Strategy 2: A contiguous span of several words is masked. Example: The quick [MASK] [MASK] [MASK] the lazy dog.

How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science