Definition

Process Reward Model (PRM)

A Process Reward Model (PRM) functions as a step-level verifier that assesses the quality of intermediate steps in a reasoning process. It is often realized as an independent language model specifically trained to assign a numerical score, or reward, to each step (aia_i') in a sequence. This approach is particularly effective when incorporating human feedback into the evaluation.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related