Case Study

Debugging a Reward Model's Input Formulation

A development team is building a model to score the quality of chatbot-generated answers. To simplify the process, they train their model using only the chatbot's answers as input, labeling each as 'high-quality' or 'low-quality'. After training, they observe that the model performs poorly. For example, it rates the answer 'I am not sure' as low-quality, even when the original user question was an unanswerable philosophical query. Analyze this situation and identify the fundamental flaw in the team's input formulation. Explain why this flaw leads to poor performance.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science