1Cademy - Empirical Power Law for LLM Loss vs. Model Size (N)

Learn Before

Power Law Formula for LLM Loss

Formula

Empirical Power Law for LLM Loss vs. Model Size (N)

Research by Kaplan et al. (2020) demonstrated that after an initial transient period, the performance of their language models improved as a power law in relation to the model size, denoted by $N$ . This empirical scaling behavior for model size is expressed mathematically as: $\mathcal{L}(N) = \big( \frac{N}{8.8 \times 10^{13}} \big)^{-0.076}$ , where $\mathcal{L}(N)$ is the loss of the model.