Formula

Empirical Reward Model Loss Formula

The theoretical reward model loss, defined as an expectation, is practically implemented as an empirical loss by averaging over the collected preference dataset Dr\mathcal{D}_r. This is based on the assumption that the data points are sampled uniformly. The formula for this empirical loss is: Lr(ϕ)=1Dr(x,ya,yb)DrlogPrϕ(yaybx)\mathcal{L}_r(\phi) = -\frac{1}{|\mathcal{D}_r|} \sum_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\in\mathcal{D}_r} \log \mathrm{Pr}_{\phi}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x}). Here, Dr|\mathcal{D}_r| represents the total number of preference pairs in the dataset.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related