Learn Before
Auto-Regressive (AR) Models
The main task of these models is to predict the next word based on previous words, and they are represented by the GPT model family. The AR infrastructures are composed of the Transformer’s decoder part, so a masking mechanism is used in the training phase, causing attention calculations to only see the content before a word. Applications are suitable for NLG tasks.
0
1
Contributors are:
Who are from:
Tags
Deep Learning (in Machine learning)
Foundations of Large Language Models Course
Data Science
Computing Sciences
Related
Auto-Encoding (AE) Models
Auto-Regressive (AR) Models
Seq2seq Models for Text Generation
An engineering team is tasked with creating a system to analyze customer reviews and automatically classify them as 'positive', 'negative', or 'neutral'. The most critical requirement is for the model to have a deep, holistic understanding of the entire review's context to make an accurate classification. Which of the following architectural approaches for a pre-trained model would be best suited for this task?
You are an NLP engineer selecting a pre-trained model architecture for three different projects. Match each project description to the most suitable underlying model training objective.
Model Architecture Selection Flaw
Learn After
Chain Rule of Probability for Auto-regressive Language Models
Permuted Language Modeling (PLM)
A language model is being trained on the sentence: 'The quick brown fox jumps over the lazy dog.' The model's primary purpose is to generate new text by predicting the next word in a sequence based only on the words that came before it. When the model is calculating the representation for the word 'jumps' during this process, which part of the sentence is it allowed to consider?
Permuted Language Modeling
Model Architecture Suitability for Sentiment Analysis
Rationale for Auto-Regressive Model Design in Text Generation