Concept

GPT-3

GPT-3 is a massive Transformer-decoder model that scales up the GPT-2 architecture by approximately two orders of magnitude in both model size and training data, utilizing 300300 billion pretraining tokens. It retains the foundational architecture of GPT-2 but incorporates sparser attention patterns at alternating layers. GPT-3 thoroughly validated the in-context learning paradigm, proving that few-shot performance rapidly improves as model capacity increases.

0

1

Updated 2026-05-15

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L

Related