Learn Before
Principle of Reward Model Adjustment
Imagine a system is being trained to prefer certain text outputs over others based on human feedback. If a human indicates that 'Output X' is better than 'Output Y', but the system initially assigns a higher score to 'Output Y', explain the fundamental principle that guides the adjustment of the system's scoring mechanism during its next training step.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
During the training of a reward model, a human is shown two responses to a prompt. The human indicates a preference for Response B over Response A. However, the reward model assigns a higher score to Response A than to Response B. Based on the core principle of the training process for this model, what is the most likely immediate outcome?
Reward Model Score Adjustment
Principle of Reward Model Adjustment