1Cademy - Debugging a Reward Models Input Formulation

Learn Before

Input Formulation for the RLHF Reward Model

Case Study

Debugging a Reward Model's Input Formulation

A development team is building a model to score the quality of chatbot-generated answers. To simplify the process, they train their model using only the chatbot's answers as input, labeling each as 'high-quality' or 'low-quality'. After training, they observe that the model performs poorly. For example, it rates the answer 'I am not sure' as low-quality, even when the original user question was an unanswerable philosophical query. Analyze this situation and identify the fundamental flaw in the team's input formulation. Explain why this flaw leads to poor performance.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related