Learn Before
Concept

Transformer Encoder Sublayers

Every individual layer within the Transformer encoder stack contains two primary sublayers: a multi-head self-attention pooling sublayer and a positionwise feed-forward network. In the encoder's self-attention mechanism, the queries, keys, and values are all sourced directly from the outputs of the immediately preceding encoder layer.

Image 0

0

1

Updated 2026-05-15

Tags

Data Science

D2L

Dive into Deep Learning @ D2L