Definition

Composition of Reward Model Parameters (ϕ)

In the context of the reward model loss function, the parameters denoted by ϕ encompass all trainable elements of the model. Specifically, this includes the parameters of the underlying Transformer decoder architecture as well as the weights of the final linear mapping matrix, Wr, which transforms the model's internal representations into a scalar reward score.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences