1Cademy - Calculating Token Modifications in Pre-training

Learn Before

BERT's Masked Language Model Pre-training Process

Short Answer

Calculating Token Modifications in Pre-training

An input sequence for a language model contains 1,000 tokens. During a data corruption pre-training step, 15% of these tokens are randomly selected as prediction targets. These selected tokens are then modified according to a specific distribution: 80% are replaced with a special mask symbol, 10% are replaced with a random token, and 10% are left unchanged.

Based on this process, calculate the expected number of tokens in the sequence that will be: a) Replaced with a mask symbol. b) Replaced with a random token. c) Left unchanged among the selected group.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related