1Cademy - Diagnosing LLM Training Plateaus

Learn Before

Combined Power Law for LLM Loss with Model and Dataset Size

Short Answer

Diagnosing LLM Training Plateaus

A research team is training a large language model and observes that after weeks of training, adding 50% more high-quality data to their training set results in a negligible decrease in the model's final loss. Based on the principle that loss (L) is a function of model size (N) and dataset size (D) as described by the relationship L(N,D) = aN^b + cD^d + ε_∞, propose two distinct and plausible explanations for this phenomenon. For each explanation, identify the specific term(s) in the equation that would be the primary cause.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related