Learn Before
Sample Efficiency of Large Language Models
In addition to achieving higher overall performance, large language models exhibit superior sample efficiency compared to smaller models. This means that a large model requires significantly fewer training samples, or processed tokens, to reach the same performance level as a smaller model.

0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
A research team is training a large language model and has a fixed, non-negotiable computational budget. Their goal is to achieve the lowest possible final loss. Based on the established principles that govern the relationship between computation, model size, data size, and performance, which of the following strategies represents the most efficient use of their budget?
Evaluating an LLM Training Strategy
Analyzing Deviations from LLM Scaling Behavior
Continued Effectiveness of Scaling up Training in NLP
Power-Law Curve of Performance Scaling
Scaling Laws Across LLM Development Stages
Tandem Scaling of LLM Training Factors
Sample Efficiency of Large Language Models
Performance Scaling in GPT-3