Formula

Optimal Reward Model Parameter Estimation

The goal of training the reward model is to find the optimal set of parameters that minimize the loss function over the preference dataset. This optimization problem can be expressed formally using the argmin\arg \min operator. Using ϕ\phi to denote the parameters and Lr\mathcal{L}_r for the loss function, the objective is given by: ϕ^=argminϕLr(ϕ)\hat{\phi} = \arg \min_{\phi} \mathcal{L}_r(\phi). By finding these optimal parameters, the operation seeks the parameter values that result in the lowest possible loss, thereby aligning the model with the human preference data.

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related