1Cademy - Consider two methods for scoring a multi-step reasoning process generated by an AI. Both methods use an underlying model that, for each step, outputs a probability that the step is correct. * **Method A:** Assigns a score of +1 to each step where the probability of being correct is greater than 0.5. The total score is the sum of these step scores. * **Method B:** Calculates the total score by summing the logarithm of the correct probability for every step in the process. Now, analyze two reasoning paths for the same problem: * **Path 1:** Consists of 3 steps, each with a correct probability of 0.9. * **Path 2:** Consists of 3 steps, each with a correct probability of 0.6. Which statement accurately compares how these two methods would score the paths?

Learn Before

Log-Probability-Based Reward for Reasoning Paths

Multiple Choice

Consider two methods for scoring a multi-step reasoning process generated by an AI. Both methods use an underlying model that, for each step, outputs a probability that the step is 'correct'.

Method A: Assigns a score of +1 to each step where the probability of being 'correct' is greater than 0.5. The total score is the sum of these step scores.
Method B: Calculates the total score by summing the logarithm of the 'correct' probability for every step in the process.

Now, analyze two reasoning paths for the same problem:

Path 1: Consists of 3 steps, each with a 'correct' probability of 0.9.
Path 2: Consists of 3 steps, each with a 'correct' probability of 0.6.

Which statement accurately compares how these two methods would score the paths?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related