Insufficiency of Simple Demonstrations for LLM Reasoning Tasks
Large Language Models can struggle to solve reasoning tasks correctly, even when provided with a demonstration in the form of a similar question-answer pair. This is because such examples do not explicitly convey the underlying reasoning process, which is often necessary for the model to arrive at the correct solution for a new problem.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.3 Prompting - Foundations of Large Language Models
Related
GSM8K Benchmark
Insufficiency of Simple Demonstrations for LLM Reasoning Tasks
A user gives a language model the following prompt: 'I have a box that contains a red ball and a blue ball. I take the red ball out and put it on the table. What is left in the box?' The model responds: 'The box contains a red ball and a blue ball.' Which of the following best analyzes the likely cause of the model's incorrect answer?
Commonsense Reasoning as a Challenging Task for LLMs
In-Context Learning (ICL)
The Challenge of Multi-Step Logical Inference for LLMs in Arithmetic Reasoning
Language Model Scheduling Error Analysis
Predicting LLM Reasoning Flaws
Example of a Prompt for Calculating the Average of 1, 3, 5, and 7
Insufficiency of Simple Demonstrations for LLM Reasoning Tasks
A user wants a language model to extract the main sentiment from a sentence and express it as a single emoji. The initial prompt, 'What is the sentiment of this sentence, as an emoji: The movie was a spectacular triumph!', results in the model responding with text like 'The sentiment is positive.' Which of the following revised prompts is structured to be the most effective at teaching the model the desired task?
Improving LLM Classification Consistency
Crafting an Effective Prompt with Demonstrations
Example of a Few-Shot Prompt for Polarity Classification
Learn After
Chain-of-Thought (COT) Prompting
Explicitly Prompting for a Reasoning Process to Prevent Errors
A user wants a language model to solve a multi-step math word problem. The user's prompt includes an example of a different, but structurally similar, word problem along with its final numerical answer. Despite this example, the model fails to solve the new problem correctly. Which statement best analyzes the most probable cause of the model's failure?
Analyzing a Failed Prompt for a Logic Puzzle
Diagnosing LLM Prompting Failures