1Cademy - BERTs Masked Language Model Pre-training Process

Learn Before

Masked Language Modeling

Activity (Process)

BERT's Masked Language Model Pre-training Process

BERT's Masked Language Model (MLM) is trained using a specific data corruption process. First, 15% of the tokens in an input sequence are randomly selected as prediction targets. Then, these selected tokens are modified according to a fixed distribution: 80% are replaced with a special [MASK] token, 10% are replaced with a random token from the vocabulary, and the remaining 10% are left unchanged. This strategy creates a 'noisy' version of the input. The Transformer encoder processes this corrupted sequence, and the model's objective is to predict the original, unmodified tokens based on the output hidden states of the selected positions.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

References

Learn Before

Related

Learn After