Learn Before
Comparison of Self-Supervised Pre-training and Self-Training
The key distinction between self-supervised pre-training in NLP and traditional self-training lies in their reliance on an initial model. Self-training requires an initial model trained on seed data to generate pseudo labels for unlabeled data. In contrast, self-supervised pre-training does not need an initial model; it generates all supervision signals directly from the raw text and trains the entire model from scratch.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Self-Supervised Pre-training and Self-Training
Architectural Categories of Pre-trained Transformers
Self-Supervised Classification Tasks for Encoder Training
Prefix Language Modeling (PrefixLM)
Mask-Predict Framework
Discriminative Training
Learning World Knowledge from Unlabeled Data
Emergent Linguistic Capabilities from Pre-training
Architectural Approaches to Self-Supervised Pre-training
Self-Supervised Pre-training of Encoders via Masked Language Modeling
Word Prediction as a Core Self-Supervised Task
Learning World Knowledge from Unlabeled Data via Self-Supervision
A research team has a massive collection of unlabeled historical texts. Their goal is to pre-train a language model that understands the specific vocabulary and sentence structures within these documents, but they have no budget for manual data annotation. Which of the following approaches is the most effective and feasible for their pre-training task?
Analysis of Supervision Signal Generation
A team is developing a pre-training strategy for a new language model using a large corpus of unlabeled text. Which of the following proposed tasks best exemplifies the principles of self-supervised learning?
Prevalence of Self-Supervised Pre-training in NLP
Historical Applications of Self-Training
Comparison of Self-Supervised Pre-training and Self-Training
A machine learning team is implementing a self-training procedure to improve a text classification model. They begin by training an initial model on a small, high-quality labeled dataset. They then use this model to predict labels for a vast collection of unlabeled text, creating 'pseudo labels'. Finally, they retrain the model on a combination of the original labeled data and the newly pseudo-labeled data. Which of the following describes the most critical risk inherent to this self-training approach?
A machine learning team has a small set of high-quality labeled data and a very large set of unlabeled data. They decide to use an iterative approach to improve their model's performance. Arrange the core steps of this process in the correct chronological order.
Evaluating a Model Training Strategy
Learn After
A research team is considering two different training strategies to build a language model using a large corpus of unlabeled text. Strategy A involves first training a preliminary model on a small, human-labeled 'seed' dataset, then using that model's predictions to create labels for the unlabeled text, and finally retraining the model on this newly labeled data. Strategy B involves no initial seed dataset; instead, it creates training tasks directly from the unlabeled text itself (e.g., by masking words and training the model to predict them) to learn from the data's inherent structure. Which statement best analyzes the fundamental difference in how these two strategies initiate the learning process?
Choosing a Training Methodology for a Foundational Model
A key difference between self-training and self-supervised pre-training is that self-training requires an initial model trained on a small set of labeled data to begin the learning process, whereas self-supervised pre-training can start with a randomly initialized model and only unlabeled data.