Learn Before
Masked Language Modeling (MLM)
As a specific application of the mask-predict framework, Masked Language Modeling (MLM) involves randomly masking tokens within a sequence and then training a model to predict these masked tokens by utilizing the complete context of the remaining, unmasked parts of the sequence.
0
1
Contributors are:
Who are from:
References
Pre-trained Models for Natural Language Processing: A Survey
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Data Science
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Masked Language Modeling (MLM)
A researcher is developing a model to understand patterns in unlabeled time-series data from weather sensors. The data for each day is a sequence of 24 hourly temperature readings. The researcher's training strategy involves taking a sequence, randomly hiding the temperature reading for a single hour, and then training the model to estimate the hidden temperature value by looking at the readings from the other 23 hours. Which fundamental training strategy does this approach best exemplify?
Dual Role of Data in a Self-Supervised Task
Analyzing Self-Supervised Training Procedures
Learn After
Comparison of Arbitrary Order Prediction and Masked Language Modeling
Permuted Language Modeling (PLM)
Next Sentence Prediction as an Auxiliary Training Objective
Permuted Language Modeling
Learning Contextual Representations via Masked Token Prediction
A language model is being trained with the following objective: It is given a sentence with a single word randomly obscured, such as 'The quick brown [HIDDEN] jumps over the lazy dog.' The model's only task is to predict the original hidden word, 'fox'. Which of the following best describes the specific contextual information the model is designed to use to make this prediction?
Analyzing a Model Training Process
A language model is being trained on the sentence: 'The quick brown fox jumps over the lazy dog.' Which of the following training scenarios best exemplifies the process of learning by predicting an obscured word using its full surrounding context?
MASS-style Masked Language Modeling
BERT-style Masked Language Modeling