1Cademy - Value Function Loss in RLHF

Learn Before

Training the Value Function with a Reward Model

Formula

Value Function Loss in RLHF

The value model in RLHF, which estimates the expected future reward from a given state, is trained simultaneously with the policy model. Its training objective is to minimize the Mean Squared Error (MSE) between its predicted state value and a target value computed from the reward model. This is effectively minimizing the squared Temporal Difference (TD) error. The loss function is:

$\mathcal{L}(\omega) = \frac{1}{M} \sum_{x \in D} \sum_{t=1}^{T} (r_t + \gamma V_\omega(x, y_{<t+1}) - V_\omega(x, y_{<t}))^2$

where $V_\omega$ is the value function with parameters $\omega$ , and the target $r_t + \gamma V_\omega(x, y_{<t+1})$ is considered a fixed value during the gradient calculation for this loss.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After