Formula

Power Law Formulation for LLM Loss

When modeling the performance of a Large Language Model, the loss, denoted as L(x)\mathcal{L}(x), is often expressed as a function of a variable of interest, xx. This variable xx can represent factors like the number of model parameters, and the loss L(x)\mathcal{L}(x) is typically measured using metrics such as the cross-entropy loss on test data. The most straightforward mathematical form used to model this relationship is a power law.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences