Learn Before
Concept

Overview of a Transformer

Transformers are sequence-to-sequence models that consist of an encoder and a decoder, each of which is a stack of L identical blocks

Each encoder block is composed of a multi-head self-attention module and position wise feed-forward network (FFN).

Each decoder block is also composed of a multi-head self-attention module and FFN, but with the added components of having cross-attention from the encoder and masked self-attention

Image 0

0

1

Updated 2025-09-24

Contributors are:

Who are from:

Tags

Data Science

Related