1Cademy - Visualizing Empirical Scaling Laws for LLM Loss

Learn Before

Empirical Power Law for LLM Loss vs. Model Size (N)

Concept

Visualizing Empirical Scaling Laws for LLM Loss

The empirical power-law relationships governing a language model's performance can be graphically represented by plotting the test loss against the model size ( $N$ ) and the training dataset size ( $D$ ). These visualizations illustrate how the test loss predictably decreases as a function of $N$ , defined mathematically as $\mathcal{L}(N) = \big( \frac{N}{8.8 \times 10^{13}} \big)^{-0.076}$ , and as a function of $D$ , defined as $\mathcal{L}(D) = \big( \frac{D}{5.4 \times 10^{13}} \big)^{-0.095}$ .

Updated 2026-04-21

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related