1Cademy - Process Reward Model (PRM)

Learn Before

Types of Step-Level Verifiers
Step-Level Annotation by Human Experts for Process Supervision

Definition

Process Reward Model (PRM)

A Process Reward Model (PRM) functions as a step-level verifier that assesses the quality of intermediate steps in a reasoning process. It is often realized as an independent language model specifically trained to assign a numerical score, or reward, to each step ( $a_i'$ ) in a sequence. This approach is particularly effective when incorporating human feedback into the evaluation.