Example

Example of Outcome-Based Reward for a Mathematical Task

An example of an effective application for outcome-based rewards is in tasks involving mathematical calculations. In this scenario, a reward model can be trained to provide a positive reward if the model's final answer is correct and a negative reward if it is incorrect, as the correctness can be easily verified from the final output alone.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences