Permuted Language Modeling (PLM)
Permuted Language Modeling (PLM) is a pre-training approach designed to resolve specific issues found in Masked Language Modeling, such as the mismatch between training and inference and the independence assumption among masked tokens. While it is a sequential prediction task, the actual order of tokens in the original text remains completely unchanged. Instead, the model is trained to predict the tokens sequentially according to an arbitrarily determined, permuted order.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Chain Rule of Probability for Auto-regressive Language Models
Permuted Language Modeling (PLM)
A language model is being trained on the sentence: 'The quick brown fox jumps over the lazy dog.' The model's primary purpose is to generate new text by predicting the next word in a sequence based only on the words that came before it. When the model is calculating the representation for the word 'jumps' during this process, which part of the sentence is it allowed to consider?
Permuted Language Modeling
Model Architecture Suitability for Sentiment Analysis
Rationale for Auto-Regressive Model Design in Text Generation
Comparison of Arbitrary Order Prediction and Masked Language Modeling
Permuted Language Modeling (PLM)
Next Sentence Prediction as an Auxiliary Training Objective
Permuted Language Modeling
Learning Contextual Representations via Masked Token Prediction
A language model is being trained with the following objective: It is given a sentence with a single word randomly obscured, such as 'The quick brown [HIDDEN] jumps over the lazy dog.' The model's only task is to predict the original hidden word, 'fox'. Which of the following best describes the specific contextual information the model is designed to use to make this prediction?
Analyzing a Model Training Process
A language model is being trained on the sentence: 'The quick brown fox jumps over the lazy dog.' Which of the following training scenarios best exemplifies the process of learning by predicting an obscured word using its full surrounding context?
MASS-style Masked Language Modeling
BERT-style Masked Language Modeling
Impact of Pre-training/Fine-tuning Mismatch on Downstream Tasks
A language model is first trained on a large text corpus where some words in each sentence are replaced with a special
[MASK]symbol, and the model's goal is to predict the original words. Subsequently, this pre-trained model is adapted for a specific task, such as sentiment analysis, using a new dataset of complete, un-masked sentences. Which of the following statements best analyzes the primary architectural conflict that arises between these two phases?Troubleshooting a Pre-trained Model's Performance
Permuted Language Modeling (PLM)
Diagnosing a Language Model's Predictive Behavior
A language model pre-trained with a standard masked language modeling objective is given the input sentence: 'The capital of the United Kingdom is [MASK] [MASK].' Which statement best describes how the model will predict the two masked tokens?
Consequences of Independent Predictions in Language Models
Permuted Language Modeling (PLM)
Learn After
Probability Factorization for Arbitrary Order Token Prediction
A language model is pre-trained using an objective where, for the input sentence 'The model learns from text', it might be tasked to predict the word 'learns' based on the context of 'text' and 'The', while the word 'model' is not yet visible to it. In the next step, it might predict 'model' based on 'text', 'The', and the newly predicted 'learns'. What is the primary advantage of this training approach compared to a standard left-to-right sequential prediction?
A language model is being pre-trained on the sentence 'The quick brown fox jumps' using a permuted objective. The model is given a random permutation of the token positions: (3, 5, 1, 4, 2). Arrange the words from the sentence in the order they will be auto-regressively predicted during this training step.
Pre-training Objective Selection
Comparison of Permuted and Causal Language Modeling
Implementing Permutation via Self-Attention Masks