Scoring an AI's Reasoning Process
Imagine a method for evaluating a multi-step solution generated by an AI. In this method, the overall quality of the entire solution is determined by simply counting the total number of individual steps that are identified as 'correct'. Given the case study below, calculate the final score for the AI's reasoning path and briefly explain how you arrived at your answer.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formula for Scoring Reasoning Paths by Counting Correct Steps
Scoring an AI's Reasoning Process
An AI's multi-step solution to a complex problem is evaluated by a separate model that classifies each step as either 'correct' or 'incorrect'. The final quality score for the entire solution is calculated by summing the total number of steps classified as 'correct'. What is a primary conceptual limitation of this evaluation approach?
Calculating a Reasoning Path Score