1Cademy - A language model is being trained to predict masked words in a text. Consider two different masking strategies: Strategy 1: 15% of the words in a sentence are masked individually at random positions. Example: `The quick [MASK] fox jumps [MASK] the lazy dog.` Strategy 2: A contiguous span of several words is masked. Example: `The quick [MASK] [MASK] [MASK] the lazy dog.` How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?

Multiple Choice

A language model is being trained to predict masked words in a text. Consider two different masking strategies:

Strategy 1: 15% of the words in a sentence are masked individually at random positions. Example: The quick [MASK] fox jumps [MASK] the lazy dog.

Strategy 2: A contiguous span of several words is masked. Example: The quick [MASK] [MASK] [MASK] the lazy dog.

How does using Strategy 2 (masking a contiguous span) primarily alter the learning challenge for the model compared to Strategy 1?

Updated 2025-09-26

Contributors are:

Who are from: