Learn Before
An AI model generates a step-by-step solution to a complex math problem. The model's final answer is correct. However, upon review, it is discovered that an intermediate calculation step contains a logical error, but a subsequent error coincidentally corrected the mistake, leading to the right final number. If an evaluator's goal is to reward sound reasoning by assessing the validity of each individual step, how would this solution be scored?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Step-Level Search with Verifiers
An AI model generates a step-by-step solution to a complex math problem. The model's final answer is correct. However, upon review, it is discovered that an intermediate calculation step contains a logical error, but a subsequent error coincidentally corrected the mistake, leading to the right final number. If an evaluator's goal is to reward sound reasoning by assessing the validity of each individual step, how would this solution be scored?
Evaluating AI Reasoning for Tutoring
Evaluating an AI-Generated Travel Plan