Formula

Combined Power Law for LLM Loss with Model and Dataset Size

To account for multiple factors simultaneously, the loss of a Large Language Model can be modeled as a function of both the number of model parameters, NN, and the size of the training dataset, DD. This relationship is captured by a combined scaling law developed by Rosenfeld et al., which incorporates an irreducible error term, ϵ\epsilon_{\infty}, resulting in the formula: L(N,D)=aNb+cDd+ϵ\mathcal{L}(N,D) = aN^b + cD^d + \epsilon_{\infty}. In this equation, the terms aNbaN^b and cDdcD^d represent the independent contributions of model size and dataset size to the overall loss, following a power law.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related