Example of Masked Language Modeling with Single and Multiple Masks
To illustrate Masked Language Modeling, consider the sentence 'The early bird catches the worm'. A training instance is created by masking one or more tokens. For example, masking a single token might yield 'The early bird the worm', where the model predicts 'catches'. Alternatively, masking multiple tokens, such as 'early' and 'worm' (e.g., at indices and ), yields the corrupted sequence 'The bird catches the '. The model must then correctly predict the masked words based on context.
0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Comparison of Masked vs. Causal Language Modeling
Formal Definition of the Masking Process in MLM
Example of Masked Language Modeling with Single and Multiple Masks
Training Objective of Masked Language Modeling (MLM)
Drawback of Masked Language Modeling: The [MASK] Token Discrepancy
Limitation of MLM: Ignoring Dependencies Between Masked Tokens
The Generator in Replaced Token Detection
Consecutive Token Masking in MLM
Token Selection and Modification Strategy in BERT's MLM
BERT's Masked Language Modeling Pre-training Pipeline
Performance Degradation and Early Stopping in Pre-training
Flexibility of Masked Language Modeling for Encoder-Decoder Training
Training Objective of the Standard BERT Model
During a self-supervised pre-training process, a model is given an input sequence where one word has been replaced by a special symbol, for example: 'The quick brown [MASK] jumps over the lazy dog.' The model's objective is to predict the original word, 'fox'. Which of the following is the direct input used by the final output layer to make this specific prediction?
Original Sequence for Masking and Deletion Examples
Arrange the following steps in the correct order to describe the process of pre-training an encoder model using a masked language modeling objective.
Evaluating a Pre-training Strategy for a Specific Application
Example of Masked Language Modeling with Single and Multiple Masks
In a masked modeling approach, an input sequence
xis transformed into a modified sequencex̄by replacing tokens at a randomly selected set of positionsA(x)with a special[MASK]symbol. Given an input sequencex = (T1, T2, T3, T4, T5, T6)and a set of selected positionsA(x) = {2, 5}(using 0-based indexing), what is the resulting modified sequencex̄?Critique of a Masking Implementation
In the formal definition of the masking process used in language models, several components are used to describe the transformation of an input sequence. Match each symbolic component with its correct description.
Learn After
A language model is being trained on the sentence: 'The chef carefully seasoned the delicious soup with a pinch of salt.' Consider two different training examples created from this sentence:
... seasoned the delicious soup with a [MASK] of salt.The [MASK] carefully seasoned the delicious [MASK] with a pinch of salt.
Which statement best analyzes the difference in the learning objective between these two examples?
Applying Masking Strategies for Language Models
Applying Masked Language Modeling