1Cademy - Optimal Reward Model Parameter Estimation

Learn Before

Reward Model Training via Ranking Loss Minimization
General Objective for Parameter Optimization via Loss Minimization
Empirical Reward Model Loss Formula using Bradley-Terry Model

Formula

Optimal Reward Model Parameter Estimation

The goal of training the reward model is to find the optimal set of parameters that minimize the loss function over the preference dataset. This optimization problem can be expressed formally using the $\arg \min$ operator. Using $\phi$ to denote the parameters and $\mathcal{L}_r$ for the loss function, the objective is given by: $\hat{\phi} = \arg \min_{\phi} \mathcal{L}_r(\phi)$ . By finding these optimal parameters, the operation seeks the parameter values that result in the lowest possible loss, thereby aligning the model with the human preference data.