Learn Before
Identifying Reward Model Inputs and Output
A language model is given the prompt: 'Summarize the plot of Hamlet in one sentence.' It generates the response: 'A young prince feigns madness to avenge his father's murder by his uncle.' A component designed to evaluate this response assigns it a numerical score of 2.5. Identify the two specific inputs that are fed into this evaluation component and describe the nature of its single output.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Notation for the RLHF Reward Model
A system designed to improve language model outputs uses a special component. This component takes a user's initial text (a prompt) and a model-generated response, then outputs a single numerical score. If this component processes two different responses for the exact same prompt, giving 'Response A' a score of 4.1 and 'Response B' a score of -0.5, what is the most accurate interpretation of these scores?
Identifying Reward Model Inputs and Output
Troubleshooting a Flawed Reward Model
Semantic Completeness in RLHF Reward Models