Learn Before
Debugging a Reward Model's Input Formulation
A development team is building a model to score the quality of chatbot-generated answers. To simplify the process, they train their model using only the chatbot's answers as input, labeling each as 'high-quality' or 'low-quality'. After training, they observe that the model performs poorly. For example, it rates the answer 'I am not sure' as low-quality, even when the original user question was an unanswerable philosophical query. Analyze this situation and identify the fundamental flaw in the team's input formulation. Explain why this flaw leads to poor performance.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sequence Representation for Reward Calculation in RLHF
A team is developing a model to automatically assign a quality score to an AI-generated response. To do this, the model must be given some text as input. Which of the following best explains why the model should be given the original prompt concatenated with the AI's response, instead of just the AI's response alone?
Reward Model Input Preparation
Debugging a Reward Model's Input Formulation