Concept

Visualizing Empirical Scaling Laws for LLM Loss

The empirical power-law relationships governing a language model's performance can be graphically represented by plotting the test loss against the model size (NN) and the training dataset size (DD). These visualizations illustrate how the test loss predictably decreases as a function of NN, defined mathematically as L(N)=(N8.8×1013)0.076\mathcal{L}(N) = \big( \frac{N}{8.8 \times 10^{13}} \big)^{-0.076}, and as a function of DD, defined as L(D)=(D5.4×1013)0.095\mathcal{L}(D) = \big( \frac{D}{5.4 \times 10^{13}} \big)^{-0.095}.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences