Formula

Empirical Power Law for LLM Loss vs. Model Size (N)

Research by Kaplan et al. (2020) demonstrated that after an initial transient period, the performance of their language models improved as a power law in relation to the model size, denoted by NN. This empirical scaling behavior for model size is expressed mathematically as: L(N)=(N8.8×1013)0.076\mathcal{L}(N) = \big( \frac{N}{8.8 \times 10^{13}} \big)^{-0.076}, where L(N)\mathcal{L}(N) is the loss of the model.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences