Learn Before
  • General Formulation of a Sequence Model

Masked Language Modeling

Masked Language Modeling (MLM) is a self-supervised learning objective where a model is trained to predict tokens that have been randomly masked in an input sequence. This approach allows the model to learn deep bidirectional representations by using both left and right contexts. For instance, if tokens x1x_1 and x3x_3 are masked in a sequence, the model's task is to predict these original tokens from the corrupted input, which can be represented as (x0,[MASK],x2,[MASK],x4)(x1,x3)(x_0, \text{[MASK]}, x_2, \text{[MASK]}, x_4) \rightarrow (x_1, x_3). The model predicts each masked token based on the full context of unmasked tokens, calculating conditional probabilities like Pr(x1e0,emask,e2,emask,e4)\text{Pr}(x_1|\mathbf{e}_0, \mathbf{e}_{\text{mask}}, \mathbf{e}_2, \mathbf{e}_{\text{mask}}, \mathbf{e}_4) and Pr(x3e0,emask,e2,emask,e4)\text{Pr}(x_3|\mathbf{e}_0, \mathbf{e}_{\text{mask}}, \mathbf{e}_2, \mathbf{e}_{\text{mask}}, \mathbf{e}_4), where e\mathbf{e} represents token embeddings.

Image 0

0

1

7 days ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Output Variation in Sequence Models

  • Fundamental Issues in Sequence Model Formulation

  • Role of the [CLS] Token in Sequence Classification

  • Standard Auto-Regressive Probability Factorization

  • Masked Language Modeling

  • Comparison of Causal and Masked Language Modeling

  • Input Formatting with Separator Tokens