Activity (Process)

Scoring Reasoning Paths by Counting Correct Steps

Once a process-based reward model is trained to classify each step of a reasoning path, it can be used to evaluate the overall quality of the entire path. A straightforward evaluation method involves aggregating the step-level classifications by simply counting the total number of steps that the model identifies as 'correct'. This count serves as the overall score or reward for the reasoning path.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models