1Cademy - Analyzing Feedback for a Multi-Step Reasoning Task

Learn Before

Segment-Based Reward Computation

Case Study

Analyzing Feedback for a Multi-Step Reasoning Task

A language model is given the prompt: 'A bakery sells muffins for $3 each and cookies for $2 each. If they sold 15 muffins and 20 cookies on Monday, what was their total revenue?' The model generates the multi-step response below. A reward system that evaluates the entire response with a single score gives it a low rating because the final answer is incorrect. Analyze the primary limitation of this single-score feedback approach for improving the model's reasoning. Then, explain how evaluating each step of the response as a separate unit would provide more useful training data.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related