Learn Before
Analyzing the Effectiveness of a Reasoning Technique
A research team is evaluating a large language model's mathematical reasoning abilities using a benchmark composed of multi-step grade school word problems. They observe that when the model is prompted to provide only the final numerical answer, its accuracy is low. However, when they modify the prompt to instruct the model to first outline the sequence of calculations and logical steps it will take before providing the final answer, the model's accuracy on the benchmark improves dramatically. Analyze the underlying reasons for this significant performance improvement. In your response, break down the relationship between the structure of the problems in the benchmark and the process the model is guided to follow.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A large language model is given the following math word problem: 'A farmer has 15 apples. He sells 7 of them and then buys 5 more. How many apples does the farmer have now?' Which of the following outputs best exemplifies the result of applying a chain-of-thought prompting technique to solve this problem?
Improving LLM Performance on a Reasoning Benchmark
Analyzing the Effectiveness of a Reasoning Technique