1Cademy - In a language models pre-training, a portion of input tokens selected for prediction are substituted with a completely random token from the vocabulary, rather than always using a special placeholder like `[MASK]`. What is the primary analytical justification for this specific strategy?

Learn Before

Random Token Replacement in BERT's MLM Strategy

Multiple Choice

In a language model's pre-training, a portion of input tokens selected for prediction are substituted with a completely random token from the vocabulary, rather than always using a special placeholder like [MASK]. What is the primary analytical justification for this specific strategy?

Updated 2025-09-29

Contributors are: