1Cademy - Example of an Outcome-Based Reward Model in Mathematics

Learn Before

Outcome-Based Reward Models

Example

Example of an Outcome-Based Reward Model in Mathematics

A practical application of an outcome-based reward model is in evaluating mathematical calculations. In this scenario, the model provides positive feedback for a correct final answer and negative feedback for an incorrect one, without assessing the intermediate steps of the calculation.

Updated 2025-10-10

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

A language model is being trained to solve math problems. The training process uses a reward system that provides feedback based only on whether the final numerical answer is correct or incorrect. The model is given the problem (5 * 4) + (10 / 2) and produces the following reasoning: Step 1: 5 * 4 = 20 Step 2: 10 / 2 = 4 Step 3: 20 + 4 = 24 Final Answer: 24

How would this reward system evaluate the model's entire response?
Evaluating a Reward Mechanism for a Financial AI
Evaluating a Flawed Mathematical Reasoning Process

Learn Before

Related

Learn After