Learn Before
Concept

Decoder-Only Transformer Architecture

Decoder-only Transformers modify the original sequence-to-sequence Transformer architecture by completely removing the encoder component as well as the decoder sublayer responsible for encoder-decoder cross-attention. This streamlined architecture has become the de facto standard for large-scale language modeling, as it can effectively leverage vast amounts of unlabeled text corpora via self-supervised learning.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related