1Cademy - Empirical Reward Model Loss Formula

Learn Before

Reward Model Loss as Negative Log-Likelihood
Modeling Pairwise Preference Probability with a Reward Function

Formula

Empirical Reward Model Loss Formula

The theoretical reward model loss, defined as an expectation, is practically implemented as an empirical loss by averaging over the collected preference dataset $\mathcal{D}_r$ . This is based on the assumption that the data points are sampled uniformly. The formula for this empirical loss is: $\mathcal{L}_r(\phi) = -\frac{1}{|\mathcal{D}_r|} \sum_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\in\mathcal{D}_r} \log \mathrm{Pr}_{\phi}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x})$ . Here, $|\mathcal{D}_r|$ represents the total number of preference pairs in the dataset.