Concept

Transformer

The Transformer is a deep learning architecture built exclusively on attention mechanisms, foregoing traditional recurrent or convolutional layers. A defining property of the Transformer is its superior scaling behavior: its performance consistently improves as the dataset size, model size, and computational budget increase. This architecture has become foundational, driving state-of-the-art results across natural language processing, computer vision, speech recognition, and reinforcement learning.

0

1

Updated 2026-04-30

Tags

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L

Learn After