Learn Before
Concept

GPT-2

Introduced a year after its predecessor, GPT-2 is a significantly larger Transformer-decoder language model containing 1.51.5 billion parameters and pretrained on 4040 GB of text. It introduced architectural refinements such as pre-normalization, as well as improved initialization and weight-scaling. GPT-2 was groundbreaking for achieving state-of-the-art results on language modeling benchmarks and promising results on multiple other tasks without requiring any parameter updates or architectural modifications.

0

1

Updated 2026-05-15

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L