Learn Before
Example of a Supervised Pre-training Task
A common example of a supervised pre-training task is sentiment classification. In this setup, a system composed of a sequence model and a classification layer is trained to identify whether a given sentence expresses a positive or negative sentiment. The representations learned during this task can then be leveraged for other downstream applications.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Process of Adapting a Supervised Pre-trained Model
Advantages of Supervised Pre-training
Disadvantages of Supervised Pre-training
Example of a Supervised Pre-training Task
A startup is building a system to automatically categorize legal contracts into specific sub-types (e.g., 'lease agreement', 'employment contract', 'non-disclosure agreement'). They have a very small, private dataset of 500 labeled contracts. Their proposed strategy is to first train a large neural network on a massive, publicly available dataset of millions of labeled news articles, classifying them by topic (e.g., 'sports', 'politics', 'technology'). After this initial training, they plan to adapt the model to their legal contract categorization task. What is the most significant weakness of this proposed pre-training approach for their specific goal?
A machine learning engineer wants to use a supervised pre-training approach to build a model that can detect toxic comments online. Arrange the following steps in the correct chronological order to reflect this process.
Evaluating a Pre-training Strategy for Scientific Text
Assumption of Supervised Pre-training
Learn After
A development team first trains a large neural network to classify thousands of user reviews as either 'positive' or 'negative'. After this initial training is complete, they remove the final classification layer and use the rest of the trained network as a starting point to build a new system for automatically generating marketing slogans. What is the most significant advantage of this two-step approach?
A machine learning engineer wants to use a large dataset of customer support emails, each labeled as 'urgent' or 'not urgent', to pre-train a model. The ultimate goal is to create a new system that can automatically categorize incoming emails into more specific topics (e.g., 'billing issue', 'technical problem', 'feedback'). Arrange the following steps in the correct logical order to accomplish this.
An e-commerce company is building a chatbot to answer customer questions about product specifications. They have a small, high-quality dataset of 5,000 question-answer pairs specific to their products, but find the chatbot struggles to understand the nuances of customer phrasing. The company also has access to a massive dataset of 2 million online forum posts, where each post has been labeled by moderators as either 'question' or 'statement'. Which of the following strategies describes the most appropriate way to use the forum post dataset to improve the final chatbot?