Concept

Drawback of Masked Language Modeling: The [MASK] Token Discrepancy

A significant drawback of Masked Language Modeling is its reliance on a special [MASK][\mathrm{MASK}] token during the training phase. Because this artificial token is not present in natural text during testing or real-world inference, it creates a discrepancy between how the model is trained and how it operates in practice at test time.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences