1Cademy - Power Law Formula for LLM Loss

Learn Before

Power Law Formulation for LLM Loss

Formula

Power Law Formula for LLM Loss

The simplest mathematical relationship representing the loss of a model, $\mathcal{L}(x)$ , given a specific variable of interest, $x$ , is defined by the power law equation: $\mathcal{L}(x) = ax^b$ . In this formula, $a$ and $b$ are parameters that must be estimated empirically. Despite being a simple function, it has successfully interpreted the scaling capabilities of both language models and machine translation systems when scaling model size ( $N$ ) or training dataset size ( $D$ ).