1Cademy - GPT-3

Learn Before

GPT Series
Data Demand for Large Language Models
Common Data Sources for Pre-training LLMs
In-Context Learning
In-Context Learning (ICL)

Concept

GPT-3

GPT-3 is a massive Transformer-decoder model that scales up the GPT-2 architecture by approximately two orders of magnitude in both model size and training data, utilizing $300$ billion pretraining tokens. It retains the foundational architecture of GPT-2 but incorporates sparser attention patterns at alternating layers. GPT-3 thoroughly validated the in-context learning paradigm, proving that few-shot performance rapidly improves as model capacity increases.