1Cademy - Unchanged Tokens in BERTs MLM Strategy

Learn Before

Token Selection and Modification Strategy in BERT's MLM

Concept

Unchanged Tokens in BERT's MLM Strategy

In BERT's Masked Language Modeling strategy, 10% of the tokens that are chosen for prediction are kept in their original, unchanged form within the input sequence.

Updated 2026-04-17

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Example of an Unchanged Token in a BERT Input Sequence
A language model is pre-trained using a method where 15% of the words in an input sentence are selected for prediction. Of these selected words, a small fraction (10%) are intentionally left in their original form, while the model is still tasked with predicting them based on the surrounding context. What is the most significant reason for this strategy of leaving some target words unchanged?
Calculating Token Modifications in Pre-training
Critique of a Modified Pre-training Strategy
Purpose of Unchanged Tokens in BERT's MLM Strategy

Learn Before

Related

Learn After