Concept

Masked Language Modeling

Masked Language Modeling (MLM) is a highly popular pre-training method for encoders and forms the foundation for models like BERT. The core principle involves creating a prediction task by masking some tokens in an input sequence. The model is then trained to predict these original, masked tokens by leveraging the surrounding unmasked tokens as context. This process forces the model to develop a deep, bidirectional understanding of language by considering both left and right contexts.

Image 0

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences