1Cademy - Power Law Formulation for LLM Loss

Learn Before

Fitting LLM Learning Curves with Diverse Functions

Formula

Power Law Formulation for LLM Loss

When modeling the performance of a Large Language Model, the loss, denoted as $\mathcal{L}(x)$ , is often expressed as a function of a variable of interest, $x$ . This variable $x$ can represent factors like the number of model parameters, and the loss $\mathcal{L}(x)$ is typically measured using metrics such as the cross-entropy loss on test data. The most straightforward mathematical form used to model this relationship is a power law.