Example

Illustrative Example of BERT's MLM Pre-training Pipeline

The data corruption pipeline for BERT's Masked Language Model pre-training can be illustrated with a step-by-step example. The process begins with an input sequence like [CLS] It is raining . [SEP] I need an umbrella . [SEP]. First, 15% of the tokens are selected for modification. Then, the 80/10/10 rule is applied to these selected tokens. For instance, 80% of them are masked, resulting in an intermediate sequence like [CLS] It is [MASK] . [SEP] I need [MASK] umbrella . [SEP]. Next, 10% are replaced with random tokens, further altering the sequence to [CLS] It is [MASK] . [SEP] I need [MASK] hat . [SEP]. The final 10% are left unchanged. This final corrupted sequence is then ready to be processed by the Transformer encoder for training.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences