Learn Before
Short Answer

Optimizing Pre-training Objectives for Large-Scale Models

A research lab is pre-training a large encoder-only transformer model on a massive text corpus. They observe that the model's performance on downstream language understanding tasks is not improving as expected, despite the large scale of training. One of the pre-training objectives involves predicting whether two input sentences are consecutive in the original text. Analyze this specific objective and explain why removing it might lead to better performance for a model trained at this scale.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science