Learn Before
Disadvantages of Supervised Pre-training
The primary drawback of supervised pre-training is its significant requirement for labeled data. As the complexity of neural networks increases, the volume of labeled data needed for effective pre-training also rises, making the approach challenging and difficult to apply when large-scale labeled datasets are not available.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Process of Adapting a Supervised Pre-trained Model
Advantages of Supervised Pre-training
Disadvantages of Supervised Pre-training
Example of a Supervised Pre-training Task
A startup is building a system to automatically categorize legal contracts into specific sub-types (e.g., 'lease agreement', 'employment contract', 'non-disclosure agreement'). They have a very small, private dataset of 500 labeled contracts. Their proposed strategy is to first train a large neural network on a massive, publicly available dataset of millions of labeled news articles, classifying them by topic (e.g., 'sports', 'politics', 'technology'). After this initial training, they plan to adapt the model to their legal contract categorization task. What is the most significant weakness of this proposed pre-training approach for their specific goal?
A machine learning engineer wants to use a supervised pre-training approach to build a model that can detect toxic comments online. Arrange the following steps in the correct chronological order to reflect this process.
Evaluating a Pre-training Strategy for Scientific Text
Assumption of Supervised Pre-training
Learn After
Pre-training Strategy Analysis
A research team aims to build a highly complex neural network for understanding a niche technical domain. They have access to a massive corpus of unlabeled technical documents but only a small, curated dataset of 5,000 documents that have been manually categorized. If the team decides to first train their model on this small, labeled dataset before adapting it to other tasks, what is the primary limitation inherent to this initial training approach?
Evaluating a Pre-training Strategy for a Niche Domain