1Cademy - Composition of Reward Model Parameters (ϕ)

Learn Before

Reward Model Loss as Negative Log-Likelihood

Definition

Composition of Reward Model Parameters (ϕ)

In the context of the reward model loss function, the parameters denoted by ϕ encompass all trainable elements of the model. Specifically, this includes the parameters of the underlying Transformer decoder architecture as well as the weights of the final linear mapping matrix, Wr, which transforms the model's internal representations into a scalar reward score.

Updated 2025-10-07

Contributors are: