1Cademy - Evaluating AI Reasoning Strategies

Learn Before

Log-Probability-Based Reward for Reasoning Paths

Case Study

Evaluating AI Reasoning Strategies

You are evaluating two reasoning paths generated by an AI for the same problem. A scoring system calculates the total reward for a path by summing the log-probabilities of each step being 'correct', as determined by an automated checker. Based on the step-level probabilities provided in the case study, which path would this scoring system prefer? Justify your answer by explaining how this scoring method treats paths with varying levels of certainty across their steps.

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related