Learn Before
Comparison of Pre-training Paradigms
Pre-training paradigms differ primarily in their data requirements, underlying assumptions, and adaptation methods. Unsupervised pre-training utilizes large-scale unlabeled data to create a good preliminary starting point, though it still requires considerable effort to further train the model with labeled data. Supervised pre-training relies on labeled datasets from the outset, operating on the assumption that different tasks are related, which allows a model trained on one task to be transferred to another via tuning. In contrast, self-supervised pre-training leverages large-scale unlabeled data to train a model that can be efficiently adapted to new tasks through methods like fine-tuning or prompting.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Contrastive Learning (CTL)
Extensions of PTMs
Applying and Adapting Pre-trained Models to Downstream Tasks
Unsupervised Pre-training
Supervised Pre-training
Self-Supervised Learning
Comparison of Pre-training Paradigms
Rationale for Categorizing Pre-training Tasks by Objective
Denoising Autoencoding
Comparability of Pre-training Tasks
Generality of Pre-training Tasks and Performance
Applying Pre-trained Models to Downstream Tasks
Identifying a Pre-training Strategy
Breadth of Pre-training Tasks
A research team is developing a new language model and is considering different pre-training approaches. Match each pre-training scenario below with the correct category of learning it represents.
A language model is being trained on a large corpus of text from the internet. The training process involves randomly hiding 15% of the words in each sentence and then tasking the model with predicting the original identity of these hidden words based on the surrounding context. Which category of pre-training task does this scenario best exemplify, and why?
Comparing Pre-training Task Categories
Comparison of Pre-training Tasks
Learn After
Selecting a Model Training Strategy
Match each pre-training paradigm with the description that best characterizes its data requirements and common adaptation methods.
A research lab has access to a vast corpus of unlabeled text from the internet but has a very limited budget for creating task-specific labeled datasets. Their goal is to develop a foundational model that can be flexibly adapted to a wide variety of future tasks, often with only a few examples for each new task. Which pre-training paradigm would be the most strategic choice for this lab?