1Cademy - Power Law Fit for Test Loss vs. Model and Dataset Size

Learn Before

Empirical Power Law for LLM Loss vs. Model Size (N)
Empirical Power Law for LLM Loss vs. Dataset Size (D)

Example

Power Law Fit for Test Loss vs. Model and Dataset Size

Visualizations of a language model's test loss plotted against model size, denoted by $N$ , and training dataset size, denoted by $D$ , illustrate empirical scaling behavior. Data points are typically plotted for illustrative purposes to show these relationships. Test loss as a function of $N$ is defined as $\mathcal{L}(N) = \big( \frac{N}{8.8 \times 10^{13}} \big)^{-0.076}$ . Similarly, test loss as a function of $D$ is defined as $\mathcal{L}(D) = \big( \frac{D}{5.4 \times 10^{13}} \big)^{-0.095}$ .