Learn Before
Representative Transformer-based PLMs
Transformer-based models have become the mainstream approach for pre-trained language models (PLMs) due to their superior performance. These powerful models are pre-trained on extensive text corpora using various word prediction tasks, such as masked language modeling (MLM). Once pre-trained, they can be successfully adapted and applied to a wide range of downstream NLP tasks. These PLMs are often categorized into three main architectural types.
0
1
Contributors are:
Who are from:
Tags
Deep Learning (in Machine learning)
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Controlled text generation using PLMs
Representative Transformer-based PLMs
Analysis of Language Model Training Strategies
A startup is developing a system to classify medical research abstracts into different fields of study (e.g., cardiology, oncology, neurology). They have a limited dataset of 10,000 labeled abstracts. Which of the following statements best justifies the decision to use a large, pre-trained language model and fine-tune it, rather than training a new model from scratch on their dataset?
A development team is building a system to classify news articles into categories like 'Sports', 'Technology', and 'Politics'. They are using a modern approach that starts with a large, general-purpose language model. Arrange the following stages of their development process into the correct chronological order.
Traditional Role of Language Models
LLMs as Complete Systems in Generative AI
Learn After
Auto-Encoding (AE) Models
Auto-Regressive (AR) Models
Seq2seq Models for Text Generation
An engineering team is tasked with creating a system to analyze customer reviews and automatically classify them as 'positive', 'negative', or 'neutral'. The most critical requirement is for the model to have a deep, holistic understanding of the entire review's context to make an accurate classification. Which of the following architectural approaches for a pre-trained model would be best suited for this task?
You are an NLP engineer selecting a pre-trained model architecture for three different projects. Match each project description to the most suitable underlying model training objective.
Model Architecture Selection Flaw