Short Answer

Diagnosing LLM Training Plateaus

A research team is training a large language model and observes that after weeks of training, adding 50% more high-quality data to their training set results in a negligible decrease in the model's final loss. Based on the principle that loss (L) is a function of model size (N) and dataset size (D) as described by the relationship L(N,D) = aN^b + cD^d + ε_∞, propose two distinct and plausible explanations for this phenomenon. For each explanation, identify the specific term(s) in the equation that would be the primary cause.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science