GSM8K Benchmark
The GSM8K (Grade School Math 8K) dataset, introduced by Cobbe et al. in 2021, is a prominent benchmark for assessing the reasoning abilities of Large Language Models. It comprises thousands of math word problems appropriate for grade school students. To evaluate a model, it is prompted to generate a solution for each problem in natural language.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
GSM8K Benchmark
Insufficiency of Simple Demonstrations for LLM Reasoning Tasks
A user gives a language model the following prompt: 'I have a box that contains a red ball and a blue ball. I take the red ball out and put it on the table. What is left in the box?' The model responds: 'The box contains a red ball and a blue ball.' Which of the following best analyzes the likely cause of the model's incorrect answer?
Commonsense Reasoning as a Challenging Task for LLMs
In-Context Learning (ICL)
The Challenge of Multi-Step Logical Inference for LLMs in Arithmetic Reasoning
Language Model Scheduling Error Analysis
Predicting LLM Reasoning Flaws