1Cademy - Architectural Categories of Pre-trained Transformers

Approach 1: Processes the input text sequentially, token by token, updating an internal state that is passed from one step to the next.
Approach 2: Processes all input tokens simultaneously, using a mechanism that directly relates every token to every other token in the input to determine context.

Learn Before

Transformer
Self-Supervised Learning

Classification

Architectural Categories of Pre-trained Transformers

Within Natural Language Processing, pre-trained models based on the Transformer are commonly categorized by their underlying architecture. These primary categories, which are targets for self-supervised pre-training approaches, include encoder-only, decoder-only, and encoder-decoder structures.

Updated 2026-04-14

Contributors are: