Learn Before
Comparison

Comparison of Pre-training Tasks

Pre-training tasks can be compared based on their input-output transformations and their applicability to specific model architectures. Language modeling variants (such as Causal and Prefix LM) focus on sequential text generation and are typically applied to decoder-only and encoder-decoder models. Masked language modeling approaches (e.g., MASS-style, BERT-style) rely on reconstructing masked tokens and are compatible with both encoder-only and encoder-decoder architectures. Permuted language modeling and discriminative training methods (like Next Sentence Prediction, Sentence Comparison, and Token Classification) are specifically tailored for encoder-only models. Finally, denoising autoencoding encompasses tasks such as token reordering, token deletion, span masking, sentinel masking, sentence reordering, and document rotation, which train encoder-decoder models to reconstruct original text from corrupted inputs.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences