Formula

Power Law Formula for LLM Loss

The simplest mathematical relationship representing the loss of a model, L(x)\mathcal{L}(x), given a specific variable of interest, xx, is defined by the power law equation: L(x)=axb\mathcal{L}(x) = ax^b. In this formula, aa and bb are parameters that must be estimated empirically. Despite being a simple function, it has successfully interpreted the scaling capabilities of both language models and machine translation systems when scaling model size (NN) or training dataset size (DD).

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences