Learn Before
Analysis of a Weighted Ranking Loss
A team is training a reward model using preference data. They are considering two different loss formulations for a single data sample consisting of a prompt x, a preferred response y_pref, and a rejected response y_rej.
Loss Formulation A:
Loss = -log(Sigmoid(R(x, y_pref) - R(x, y_rej)))
Loss Formulation B:
Loss = -Pr(y_pref ≻ y_rej | x) * log(Sigmoid(R(x, y_pref) - R(x, y_rej)))
Explain the practical implication of including the Pr(y_pref ≻ y_rej | x) term in Loss Formulation B. How does this term change the way the model learns from the preference data compared to using Loss Formulation A?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A reward model is being trained using a pair-wise ranking loss function. For a given prompt
x, the preference dataset contains a pair of responses: a preferred responsey_prefand a rejected responsey_rej. Initially, the model assigns the following scores:R(x, y_pref) = 2.0andR(x, y_rej) = 3.0. Based on the objective of the loss function, what is the most likely change to these scores after a single optimization step on this data point?Analysis of a Weighted Ranking Loss
Handling Labeler Disagreement in Reward Modeling