Learn Before
General Formulation of a Sequence Model
Masked Language Modeling
Masked Language Modeling (MLM) is a self-supervised learning objective where a model is trained to predict tokens that have been randomly masked in an input sequence. This approach allows the model to learn deep bidirectional representations by using both left and right contexts. For instance, if tokens and are masked in a sequence, the model's task is to predict these original tokens from the corrupted input, which can be represented as . The model predicts each masked token based on the full context of unmasked tokens, calculating conditional probabilities like and , where represents token embeddings.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Output Variation in Sequence Models
Fundamental Issues in Sequence Model Formulation
Role of the [CLS] Token in Sequence Classification
Standard Auto-Regressive Probability Factorization
Masked Language Modeling
Comparison of Causal and Masked Language Modeling
Input Formatting with Separator Tokens