Learn Before
Inadequacy of Pre-trained Model Parameters for Downstream Tasks
Because the initial parameters of a pre-trained model—specifically the classifier parameters and the encoder parameters —are not originally optimized for a specific downstream classification task, the model cannot be applied directly. Instead, a modified version of the model must be created and adapted to achieve accurate results on the new task.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Encoder Function in Machine Learning
Classifier Head
Inadequacy of Pre-trained Model Parameters for Downstream Tasks
A machine learning model is built to automatically categorize news articles into topics like 'Sports', 'Technology', or 'Politics'. The model first reads the raw text of an article and converts it into a fixed-size numerical vector that summarizes the article's content. This numerical summary is then used to decide which topic the article belongs to. Arrange the following steps to accurately represent the flow of information through this model.
A model is designed to classify text into categories. It works in two stages: first, it generates a fixed-length numerical vector that represents the meaning of the input text. Second, a separate component uses this vector to predict the final category. The model performs poorly on a task requiring it to distinguish between sincere and sarcastic statements. The developers suspect that the numerical vectors for sarcastic statements are not distinct enough from those for sincere statements. Which stage of the model is the primary source of this problem?
Reusability in a Two-Stage Classification Model
Learn After
Adaptation of Pre-trained Models via Full Fine-Tuning
Freezing Encoder Parameters During Fine-Tuning
Evaluating the Direct Application of a General Language Model
A team develops a large language model by training it on a vast collection of text from the internet, with the sole objective of making it proficient at predicting the next word in a sequence. They then attempt to use this model directly, without any changes, to categorize customer support emails into 'Billing Issue', 'Technical Problem', or 'Feature Request'. The model performs poorly. Which of the following statements best explains this outcome?
Mismatch Between Pre-training and Downstream Objectives