Learn Before
  • Challenging Reasoning Tasks for LLMs

GSM8K Benchmark

The GSM8K (Grade School Math 8K) dataset, introduced by Cobbe et al. in 2021, is a prominent benchmark for assessing the reasoning abilities of Large Language Models. It comprises thousands of math word problems appropriate for grade school students. To evaluate a model, it is prompted to generate a solution for each problem in natural language.

0

1

6 months ago

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • GSM8K Benchmark

  • Insufficiency of Simple Demonstrations for LLM Reasoning Tasks

  • A user gives a language model the following prompt: 'I have a box that contains a red ball and a blue ball. I take the red ball out and put it on the table. What is left in the box?' The model responds: 'The box contains a red ball and a blue ball.' Which of the following best analyzes the likely cause of the model's incorrect answer?

  • Commonsense Reasoning as a Challenging Task for LLMs

  • In-Context Learning (ICL)

  • The Challenge of Multi-Step Logical Inference for LLMs in Arithmetic Reasoning

  • Language Model Scheduling Error Analysis

  • Predicting LLM Reasoning Flaws

Learn After
  • Application of COT Prompting on GSM8K Benchmark

  • Example of a GSM8K Word Problem (Softball)