1Cademy - GSM8K Benchmark

Learn Before

Challenging Reasoning Tasks for LLMs

Dataset

GSM8K Benchmark

The GSM8K (Grade School Math 8K) dataset, introduced by Cobbe et al. in 2021, is a prominent benchmark for assessing the reasoning abilities of Large Language Models. It comprises thousands of math word problems appropriate for grade school students. To evaluate a model, it is prompted to generate a solution for each problem in natural language.

Updated 2026-04-30

Contributors are: