1Cademy - Diagnosing Pre-training Issues in Large-Scale Models

Learn Before

RoBERTa

Case Study

Diagnosing Pre-training Issues in Large-Scale Models

Based on findings from research on scaling up such models, what is the most likely reason for the model's disappointing performance, and what change to the pre-training process would you recommend?

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

RoBERTa's Key Findings on Scaling
Impact of Removing NSP Loss in RoBERTa
General Direction for Pre-training: Scaling Simple Tasks
A research team is building a new language model for natural language understanding tasks. They have a fixed model architecture and a large computational budget. They are debating the most effective pre-training strategy. Based on the primary findings demonstrated by subsequent improvements on encoder-only models, which approach is most likely to yield the best performance?
Diagnosing Pre-training Issues in Large-Scale Models
Optimizing Pre-training Objectives for Large-Scale Models

Learn Before

Related