Concept

Random Token Replacement in BERT's MLM Strategy

As part of BERT's token modification strategy for Masked Language Modeling, 10% of the tokens chosen for prediction are replaced with a random token from the vocabulary. This technique intentionally introduces noise into the input, which trains the model to recover the original token from a corrupted sequence, thereby improving its robustness.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences