Concept

Examples of Pre-trained Transformers by Architecture

Encoder only -> natural language understanding

  • BERT
  • RoBERTa (removes NSP task from BERT)

Decoder only -> language modeling

  • GPT series

Encoder-Decoder -> equipped with the ability to perform both natural language understanding and generation

  • BART
  • T5

0

1

Updated 2026-05-02

Tags

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Related