Learn Before
Multimodal Chain-of-Thought Reasoning
Multimodal chain-of-thought reasoning extends step-by-step logical prompting to inputs that combine both text and images. Language models utilizing these multimodal inputs achieve higher reasoning accuracy compared to when they rely on text input alone.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Example of a Model Initiating Chain-of-Thought Reasoning
A user is trying to solve a word problem using a large language model. They use two different prompts and receive two different responses.
Prompt 1: "A farmer has 5 pens, and each pen holds 8 chickens. The farmer buys 10 more chickens. How many chickens does the farmer have in total?" Response 1: "The farmer has 50 chickens in total."
Prompt 2: "A farmer has 5 pens, and each pen holds 8 chickens. The farmer buys 10 more chickens. How many chickens does the farmer have in total? Let's think step by step." Response 2: "First, we find the initial number of chickens. The farmer has 5 pens with 8 chickens each, so 5 * 8 = 40 chickens. Then, the farmer buys 10 more chickens. So, we add those to the initial amount: 40 + 10 = 50 chickens. The farmer has 50 chickens in total."
Based on the information provided, what is the most likely reason for the difference in the structure and detail of the two responses?
Modifying a Prompt for Step-by-Step Reasoning
Improving AI Reasoning for a Multi-Step Problem
Improving Chain-of-Thought Reasoning
Zero-Shot Chain-of-Thought Reasoning
Multimodal Chain-of-Thought Reasoning