Learn Before
Evaluating AI Reasoning Strategies
You are evaluating two reasoning paths generated by an AI for the same problem. A scoring system calculates the total reward for a path by summing the log-probabilities of each step being 'correct', as determined by an automated checker. Based on the step-level probabilities provided in the case study, which path would this scoring system prefer? Justify your answer by explaining how this scoring method treats paths with varying levels of certainty across their steps.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formula for Log-Probability-Based Reward for Reasoning Paths
Consider two methods for scoring a multi-step reasoning process generated by an AI. Both methods use an underlying model that, for each step, outputs a probability that the step is 'correct'.
- Method A: Assigns a score of +1 to each step where the probability of being 'correct' is greater than 0.5. The total score is the sum of these step scores.
- Method B: Calculates the total score by summing the logarithm of the 'correct' probability for every step in the process.
Now, analyze two reasoning paths for the same problem:
- Path 1: Consists of 3 steps, each with a 'correct' probability of 0.9.
- Path 2: Consists of 3 steps, each with a 'correct' probability of 0.6.
Which statement accurately compares how these two methods would score the paths?
Evaluating AI Reasoning Strategies
Nuanced Evaluation of Reasoning Paths