1Cademy - Performance Scaling in GPT-3

Learn Before

GPT-3
Scaling Laws for LLMs

Concept

Performance Scaling in GPT-3

Empirical evaluations of GPT-3 demonstrate that its performance across various benchmarks consistently improves as its parameter count increases. Within the in-context learning paradigm, the model's few-shot performance exhibits the most rapid and pronounced gains as the model size scales up. Furthermore, GPT-3's performance (measured by cross-entropy validation loss) follows a predictable power-law trend with the amount of compute used for training. This empirical scaling behavior continues established power-law trends for an additional two orders of magnitude with only small deviations.

Updated 2026-05-15

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related