Learn Before
Concept

Transformer Architecture Overview

The Transformer is an instance of the encoder-decoder architecture that fundamentally relies on self-attention. Unlike attention mechanisms used in standard sequence-to-sequence learning, the Transformer adds positional encoding to both the input (source) and output (target) sequence embeddings before feeding them into the encoder and decoder, respectively.

Image 0

0

1

Updated 2026-05-15

Tags

Data Science

D2L

Dive into Deep Learning @ D2L

Related