1Cademy - Token Selection and Modification Strategy in BERTs MLM

Learn Before

Self-Supervised Pre-training of Encoders via Masked Language Modeling

Concept

Token Selection and Modification Strategy in BERT's MLM

In the standard implementation of Masked Language Modeling for the BERT model, 15% of the tokens within each input sequence are randomly chosen for prediction. After these tokens are selected, the sequence is altered according to a specific modification strategy, which involves changing the selected tokens in one of three ways.

Updated 2026-04-17

Contributors are: