1Cademy - Drawback of Masked Language Modeling: The [MASK] Token Discrepancy

Learn Before

Self-Supervised Pre-training of Encoders via Masked Language Modeling

Concept

Drawback of Masked Language Modeling: The [MASK] Token Discrepancy

A significant drawback of Masked Language Modeling is its reliance on a special $[\mathrm{MASK}]$ token during the training phase. Because this artificial token is not present in natural text during testing or real-world inference, it creates a discrepancy between how the model is trained and how it operates in practice at test time.