1Cademy - Process Reward Models

Learn Before

Classification of Reward Models for LLM Reasoning

Concept

Process Reward Models

A process reward model is a type of verifier used in reinforcement learning for LLMs that assesses the quality of each intermediate step in a reasoning path. This approach provides more granular feedback compared to only evaluating the final outcome and is conceptually similar to step-level verifiers.

Updated 2026-05-06

Contributors are: