1Cademy - GPT-2

Learn Before

GPT Series
GPT (Generative Pre-Training)

Concept

GPT-2

Introduced a year after its predecessor, GPT-2 is a significantly larger Transformer-decoder language model containing $1.5$ billion parameters and pretrained on $40$ GB of text. It introduced architectural refinements such as pre-normalization, as well as improved initialization and weight-scaling. GPT-2 was groundbreaking for achieving state-of-the-art results on language modeling benchmarks and promising results on multiple other tasks without requiring any parameter updates or architectural modifications.