Concept

Process Reward Models

A process reward model is a type of verifier used in reinforcement learning for LLMs that assesses the quality of each intermediate step in a reasoning path. This approach provides more granular feedback compared to only evaluating the final outcome and is conceptually similar to step-level verifiers.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences