1Cademy - Critique of a Modified Pre-training Strategy

Learn Before

Unchanged Tokens in BERT's MLM Strategy

Case Study

Critique of a Modified Pre-training Strategy

A data scientist is pre-training a language model using a masked prediction objective. In their setup, 15% of the input tokens are selected for prediction. However, to simplify the process, they decide to replace 100% of these selected tokens with a special [MASK] token. After pre-training, they observe that the model's performance on various downstream fine-tuning tasks is significantly worse than expected. Evaluate the data scientist's simplified pre-training strategy. Why did this modification likely lead to poor performance during fine-tuning, despite the model performing well on the pre-training objective itself?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related