Learn Before
A researcher is training a reward model using a small preference dataset, , which contains exactly two preference pairs:
- For input , response is preferred over .
- For input , response is preferred over .
Given the empirical loss formula , which of the following expressions correctly represents the loss for this specific dataset?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Impact of Data Distribution on Reward Model Training
A researcher is training a reward model using a small preference dataset, , which contains exactly two preference pairs:
- For input , response is preferred over .
- For input , response is preferred over .
Given the empirical loss formula , which of the following expressions correctly represents the loss for this specific dataset?
Comparing Reward Model Performance