1Cademy - Approximating Expected Loss with Empirical Loss

Learn Before

Reward Model Loss as Negative Log-Likelihood

Concept

Approximating Expected Loss with Empirical Loss

In practical applications, the theoretical reward model loss, which is defined as an expectation over the entire data distribution, is replaced with an empirical loss calculated as a summation over the collected dataset Dr. This substitution is valid under the assumption that the preference pairs (x, ya, yb) are sampled uniformly from the dataset, allowing for the direct computation of the loss from the available data.

Updated 2025-10-07

Contributors are: