Dataset

LIMA Dataset

The LIMA dataset is an instruction-following dataset consisting of only 1,000 highly curated samples derived from various natural language processing tasks. Fine-tuning a large model, such as the 65-billion parameter LLaMa, on this carefully crafted subset has been shown to produce performance competitive with or superior to models fine-tuned with substantially more data and effort.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences