Learn Before
Classification

Classification of Reward Models for LLM Reasoning

A crucial aspect of applying reinforcement learning to reasoning is the design of the reward model, which can be categorized based on its evaluation target. One type is the 'outcome reward model,' which assesses the final answer's quality or correctness. Another is the 'process reward model,' conceptually similar to a step-level verifier, which evaluates each intermediate step in the reasoning chain. A third approach is a 'rule-based reward model,' which can be implemented using simple heuristics, such as rewarding longer outputs to encourage more detailed reasoning.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences