A model is being trained to evaluate text completions for a given prompt. The training data consists of pairs of completions for each prompt, where one is marked as 'preferred' () and the other as 'dispreferred' () by human reviewers. The model learns by minimizing the following loss function, averaged over all pairs in the dataset:
where is the sigmoid function and is the value the model assigns to a completion .
What is the primary effect of minimizing this loss function on the scores the model assigns?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A model is being trained to evaluate text completions for a given prompt. The training data consists of pairs of completions for each prompt, where one is marked as 'preferred' () and the other as 'dispreferred' () by human reviewers. The model learns by minimizing the following loss function, averaged over all pairs in the dataset:
where is the sigmoid function and is the value the model assigns to a completion .
What is the primary effect of minimizing this loss function on the scores the model assigns?
Calculating Pair-wise Ranking Loss
Analyzing Reward Model Loss Behavior