Learn Before
Reward Model Score Adjustment
Based on the provided scenario, explain how the training process will adjust the reward model's scores for Completion X and Completion Y. Describe the principle guiding this adjustment.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
During the training of a reward model, a human is shown two responses to a prompt. The human indicates a preference for Response B over Response A. However, the reward model assigns a higher score to Response A than to Response B. Based on the core principle of the training process for this model, what is the most likely immediate outcome?
Reward Model Score Adjustment
Principle of Reward Model Adjustment