Case Study

Critique of a Modified Pre-training Strategy

A data scientist is pre-training a language model using a masked prediction objective. In their setup, 15% of the input tokens are selected for prediction. However, to simplify the process, they decide to replace 100% of these selected tokens with a special [MASK] token. After pre-training, they observe that the model's performance on various downstream fine-tuning tasks is significantly worse than expected. Evaluate the data scientist's simplified pre-training strategy. Why did this modification likely lead to poor performance during fine-tuning, despite the model performing well on the pre-training objective itself?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science