Learn Before
Concept

Performance Scaling in GPT-3

Empirical evaluations of GPT-3 demonstrate that its performance across various benchmarks consistently improves as its parameter count increases. Within the in-context learning paradigm, the model's few-shot performance exhibits the most rapid and pronounced gains as the model size scales up. Furthermore, GPT-3's performance (measured by cross-entropy validation loss) follows a predictable power-law trend with the amount of compute used for training. This empirical scaling behavior continues established power-law trends for an additional two orders of magnitude with only small deviations.

Image 0

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related