Learn Before
Comparing Model Architectures for Different NLP Tasks
A team is using a pre-trained language model for two different tasks. Task A is sentiment analysis, classifying a movie review as either 'positive' or 'negative'. Task B is predicting a 'readability score' for a news article, on a continuous scale from 0.0 to 100.0. Analyze the fundamental difference required in the model's final prediction network (the 'head') to handle Task A versus Task B. Explain why this difference is necessary based on the nature of each task's output.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sentence Similarity Calculation using BERT-based Regression
Illustration of BERT for Text-Pair Tasks (Classification and Regression)
Training BERT-based Regression Models via Loss Minimization
Adapting a Language Model for a New Task
A data science team has a pre-trained transformer model that has been successfully fine-tuned for a text classification task, predicting whether a product review is 'positive' or 'negative'. They now want to adapt this model for a new regression task: predicting a continuous 'star rating' for reviews, on a scale from 1.0 to 5.0. Which of the following modifications represents the most direct and essential change to the model's architecture to enable this new task?
Comparing Model Architectures for Different NLP Tasks