Case Study

Diagnosing Pre-training Deficiencies

A development team has pre-trained an encoder-decoder model using a denoising objective. The only method used to corrupt the input text was to replace a random 15% of the words with a special placeholder symbol. When fine-tuning this model for a paraphrasing task, they observe a significant weakness: the model struggles to recognize that two sentences with different word orders can have the same meaning (e.g., 'The team celebrated the victory' vs. 'The victory was celebrated by the team'). Based on this observation, what is a likely deficiency in the model's pre-training, and which specific input corruption strategy could be introduced to mitigate this issue? Justify your answer.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science