1Cademy - Example of Outcome-Based Reward for a Mathematical Task

Learn Before

Limitations of Outcome-Based Rewards for Entire Sequences

Example

Example of Outcome-Based Reward for a Mathematical Task

An example of an effective application for outcome-based rewards is in tasks involving mathematical calculations. In this scenario, a reward model can be trained to provide a positive reward if the model's final answer is correct and a negative reward if it is incorrect, as the correctness can be easily verified from the final output alone.

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

A team of engineers is developing a system to automatically provide feedback to a language model during its training phase. They decide to use a simple reward mechanism: a positive signal is given if the model's final output is correct, and a negative signal is given if it is incorrect. For which of the following tasks would this reward mechanism be the most effective and least ambiguous?
Reward Strategy for a Mathematical AI
Evaluating a Reward Strategy for an AI Math Tutor

Learn Before

Related

Learn After