1Cademy - Outcome Reward Models

Learn Before

Classification of Reward Models for LLM Reasoning

Concept

Outcome Reward Models

An outcome reward model is a type of verifier used in reinforcement learning for LLMs that evaluates the final answer of a reasoning process. It assesses the correctness or overall quality of the end result, providing a reward signal based solely on this final evaluation.

Updated 2026-05-06

Contributors are: