1Cademy - Classification of Reward Models for LLM Reasoning

Learn Before

Reinforcement Learning for Reasoning

Classification

Classification of Reward Models for LLM Reasoning

A crucial aspect of applying reinforcement learning to reasoning is the design of the reward model, which can be categorized based on its evaluation target. One type is the 'outcome reward model,' which assesses the final answer's quality or correctness. Another is the 'process reward model,' conceptually similar to a step-level verifier, which evaluates each intermediate step in the reasoning chain. A third approach is a 'rule-based reward model,' which can be implemented using simple heuristics, such as rewarding longer outputs to encourage more detailed reasoning.

Updated 2026-05-06

Contributors are: