1Cademy - Scoring Reasoning Paths by Counting Correct Steps

Learn Before

Process-Based Reward Model as a Classification Task

Activity (Process)

Scoring Reasoning Paths by Counting Correct Steps

Once a process-based reward model is trained to classify each step of a reasoning path, it can be used to evaluate the overall quality of the entire path. A straightforward evaluation method involves aggregating the step-level classifications by simply counting the total number of steps that the model identifies as 'correct'. This count serves as the overall score or reward for the reasoning path.