Short Answer

Benefit of Consecutive Token Masking

A language model is trained using a method where it receives a sentence with a sequence of several adjacent words replaced by a special marker (e.g., The team celebrated their [MASK] [MASK] [MASK] .). The model's task is to generate the original, complete sentence (e.g., The team celebrated their hard-earned victory .). Beyond simply learning vocabulary, what specific capability related to language structure does this training method help the model develop?

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science