Case Study

Analyzing Feedback for a Multi-Step Reasoning Task

A language model is given the prompt: 'A bakery sells muffins for $3 each and cookies for $2 each. If they sold 15 muffins and 20 cookies on Monday, what was their total revenue?' The model generates the multi-step response below. A reward system that evaluates the entire response with a single score gives it a low rating because the final answer is incorrect. Analyze the primary limitation of this single-score feedback approach for improving the model's reasoning. Then, explain how evaluating each step of the response as a separate unit would provide more useful training data.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science