Example

Power Law Fit for Test Loss vs. Model and Dataset Size

Visualizations of a language model's test loss plotted against model size, denoted by NN, and training dataset size, denoted by DD, illustrate empirical scaling behavior. Data points are typically plotted for illustrative purposes to show these relationships. Test loss as a function of NN is defined as L(N)=(N8.8×1013)0.076\mathcal{L}(N) = \big( \frac{N}{8.8 \times 10^{13}} \big)^{-0.076}. Similarly, test loss as a function of DD is defined as L(D)=(D5.4×1013)0.095\mathcal{L}(D) = \big( \frac{D}{5.4 \times 10^{13}} \big)^{-0.095}.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences