1Cademy - Optimizing Pre-training Objectives for Large-Scale Models

Learn Before

RoBERTa

Short Answer

Optimizing Pre-training Objectives for Large-Scale Models

A research lab is pre-training a large encoder-only transformer model on a massive text corpus. They observe that the model's performance on downstream language understanding tasks is not improving as expected, despite the large scale of training. One of the pre-training objectives involves predicting whether two input sentences are consecutive in the original text. Analyze this specific objective and explain why removing it might lead to better performance for a model trained at this scale.

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related