Learn Before
BERT-style Masked Language Modeling
BERT-style masked language modeling is a variant where individual, often non-contiguous, tokens in a sequence are masked or replaced with other words. For instance, given an input like [C] The kitten [M] playing the [M] ., the model is trained to predict the original tokens at the specific positions selected for corruption. As shown in Table 1, it reconstructs the individual tokens (such as kitten, is, chasing, and ball) at their respective positions rather than outputting a single concatenated phrase. This approach is typically applied to encoder-only or encoder-decoder models.
0
1
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Comparison of Arbitrary Order Prediction and Masked Language Modeling
Permuted Language Modeling (PLM)
Next Sentence Prediction as an Auxiliary Training Objective
Permuted Language Modeling
Learning Contextual Representations via Masked Token Prediction
A language model is being trained with the following objective: It is given a sentence with a single word randomly obscured, such as 'The quick brown [HIDDEN] jumps over the lazy dog.' The model's only task is to predict the original hidden word, 'fox'. Which of the following best describes the specific contextual information the model is designed to use to make this prediction?
Analyzing a Model Training Process
A language model is being trained on the sentence: 'The quick brown fox jumps over the lazy dog.' Which of the following training scenarios best exemplifies the process of learning by predicting an obscured word using its full surrounding context?
MASS-style Masked Language Modeling
BERT-style Masked Language Modeling