Example of Masked Language Modeling Prediction
Masked Language Modeling (MLM) trains a model to predict masked tokens by using the surrounding unmasked tokens as context. For instance, if an original sequence is modified by masking tokens and , the input becomes x0, [MASK], x2, [MASK], x4. The model's objective is to predict the original values of and . This is achieved by conditioning the prediction on the embeddings of the entire input sequence, including the unmasked tokens and the special [MASK] tokens. The probabilities are formally expressed as and . The unmasked tokens () are not predicted; their output can be considered to have a probability of 1.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Causal Language Modeling as a Special Case of Masked Language Modeling
Example of Masked Language Modeling Prediction
Consider two different approaches for training a language model to predict a specific word within a sentence.
Approach 1: The model is trained to predict the next word in a sequence, using only the words that have appeared before it.
Approach 2: The model is trained to predict a word that has been intentionally hidden, using all the other visible words in the sentence, both those that come before and after the hidden word.
If both models are tasked with predicting the word 'jumps' in the sentence 'The quick brown fox jumps over the lazy dog', which statement correctly analyzes the contextual information available to each model for this specific task?
Choosing the Right Contextual Approach for Language Tasks
Match each description of a language model's prediction task or characteristic to the type of contextual information it utilizes.
Learn After
A language model is given the input sentence: 'The quick brown [MASK] jumps over the lazy dog.' The model's objective is to predict the masked word by considering the full context of the unmasked words around it, both to the left and to the right. Which set of words provides the necessary context for the model to make this prediction?
Masked Language Model Prediction Task
Consider a language model being trained with the input sequence: 'The quick brown [MASK] jumps over the [MASK] dog.' During the training process, the model's objective is to correctly predict the words for the two
[MASK]tokens, and also to confirm the identities of the unmasked words ('The', 'quick', 'brown', etc.).