Activity (Process)

Log-Probability-Based Reward for Reasoning Paths

An alternative method for evaluating a reasoning path involves using the log-probabilities generated by the step-level classification model. Instead of simply counting the number of steps deemed 'correct', this approach aggregates the log-probabilities to form the total reward for the entire path, providing a more nuanced score.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models