Learn Before
Troubleshooting a Flawed Reward Model
Based on the fundamental design of a reward model within this type of learning framework, what is the critical error in the engineer's approach, and why does this error lead to the observed problem?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Notation for the RLHF Reward Model
A system designed to improve language model outputs uses a special component. This component takes a user's initial text (a prompt) and a model-generated response, then outputs a single numerical score. If this component processes two different responses for the exact same prompt, giving 'Response A' a score of 4.1 and 'Response B' a score of -0.5, what is the most accurate interpretation of these scores?
Identifying Reward Model Inputs and Output
Troubleshooting a Flawed Reward Model
Semantic Completeness in RLHF Reward Models