1Cademy - LIMA Dataset

Learn Before

Limited Scope of Fine-Tuning Data for Downstream Tasks

Dataset

LIMA Dataset

The LIMA dataset is an instruction-following dataset consisting of only 1,000 highly curated samples derived from various natural language processing tasks. Fine-tuning a large model, such as the 65-billion parameter LLaMa, on this carefully crafted subset has been shown to produce performance competitive with or superior to models fine-tuned with substantially more data and effort.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related