Learn Before
Comparison of Pre-training Tasks
Pre-training tasks can be compared based on their input-output transformations and their applicability to specific model architectures. Language modeling variants (such as Causal and Prefix LM) focus on sequential text generation and are typically applied to decoder-only and encoder-decoder models. Masked language modeling approaches (e.g., MASS-style, BERT-style) rely on reconstructing masked tokens and are compatible with both encoder-only and encoder-decoder architectures. Permuted language modeling and discriminative training methods (like Next Sentence Prediction, Sentence Comparison, and Token Classification) are specifically tailored for encoder-only models. Finally, denoising autoencoding encompasses tasks such as token reordering, token deletion, span masking, sentinel masking, sentence reordering, and document rotation, which train encoder-decoder models to reconstruct original text from corrupted inputs.
0
1
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Contrastive Learning (CTL)
Extensions of PTMs
Applying and Adapting Pre-trained Models to Downstream Tasks
Unsupervised Pre-training
Supervised Pre-training
Self-Supervised Learning
Comparison of Pre-training Paradigms
Rationale for Categorizing Pre-training Tasks by Objective
Denoising Autoencoding
Comparability of Pre-training Tasks
Generality of Pre-training Tasks and Performance
Applying Pre-trained Models to Downstream Tasks
Identifying a Pre-training Strategy
Breadth of Pre-training Tasks
A research team is developing a new language model and is considering different pre-training approaches. Match each pre-training scenario below with the correct category of learning it represents.
A language model is being trained on a large corpus of text from the internet. The training process involves randomly hiding 15% of the words in each sentence and then tasking the model with predicting the original identity of these hidden words based on the surrounding context. Which category of pre-training task does this scenario best exemplify, and why?
Comparing Pre-training Task Categories
Comparison of Pre-training Tasks