Learn Before
Composition of Reward Model Parameters (ϕ)
In the context of the reward model loss function, the parameters denoted by ϕ encompass all trainable elements of the model. Specifically, this includes the parameters of the underlying Transformer decoder architecture as well as the weights of the final linear mapping matrix, Wr, which transforms the model's internal representations into a scalar reward score.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Pair-wise Ranking Loss Formula for RLHF Reward Model
Empirical Reward Model Loss Formula using Bradley-Terry Model
A reward model is trained to learn human preferences by minimizing the following loss function, which is an expectation over a preference dataset :
In this dataset, represents a response preferred over response for a given input . What is the primary effect of successfully minimizing this loss function on the model's behavior?
Reward Model Training Diagnosis
Composition of Reward Model Parameters (ϕ)
Approximating Expected Loss with Empirical Loss
Empirical Reward Model Loss Formula
Impact of Prediction Confidence on Reward Model Loss
Learn After
A reward model is constructed by taking a large, pre-trained language model and adding a new linear layer on top to output a single scalar value. To train this model efficiently, an engineer freezes the weights of the pre-trained language model and only updates the weights of the new linear layer. How does this training strategy relate to the complete set of the reward model's parameters, denoted as ϕ?
Components of Reward Model Parameters
In the context of training a reward model, the parameter set ϕ, which is optimized to minimize the loss function, consists solely of the weights of the final linear layer responsible for mapping the model's internal representations to a scalar reward score.