Learn Before
  • Self-Supervised Pre-training of Encoders via Masked Language Modeling

Limitation of MLM: Ignoring Dependencies Between Masked Tokens

A key limitation of the auto-encoding objective in Masked Language Modeling (MLM) is its failure to account for dependencies among the masked tokens. The model is trained to predict each masked token independently of the others. For example, if two tokens x2x_2 and x6x_6 in a sequence are masked, the prediction for the first masked token (x2x_2) is generated independently of the second masked token (x6x_6), even though x6x_6 should ideally be considered within the context of x2x_2.

0

1

15 days ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Comparison of Masked vs. Causal Language Modeling

  • Formal Definition of the Masking Process in MLM

  • Example of Masked Language Modeling with Single and Multiple Masks

  • Training Objective of Masked Language Modeling (MLM)

  • Drawback of Masked Language Modeling: The [MASK] Token Discrepancy

  • Limitation of MLM: Ignoring Dependencies Between Masked Tokens

  • The Generator in Replaced Token Detection

  • Consecutive Token Masking in MLM

  • Token Selection and Modification Strategy in BERT's MLM

  • BERT's Masked Language Modeling Pre-training Pipeline

  • Performance Degradation and Early Stopping in Pre-training

  • Flexibility of Masked Language Modeling for Encoder-Decoder Training

  • Training Objective of the Standard BERT Model

  • During a self-supervised pre-training process, a model is given an input sequence where one word has been replaced by a special symbol, for example: 'The quick brown [MASK] jumps over the lazy dog.' The model's objective is to predict the original word, 'fox'. Which of the following is the direct input used by the final output layer to make this specific prediction?

  • Original Sequence for Masking and Deletion Examples

  • Arrange the following steps in the correct order to describe the process of pre-training an encoder model using a masked language modeling objective.

  • Evaluating a Pre-training Strategy for a Specific Application

Learn After
  • Diagnosing a Language Model's Predictive Behavior

  • A language model pre-trained with a standard masked language modeling objective is given the input sentence: 'The capital of the United Kingdom is [MASK] [MASK].' Which statement best describes how the model will predict the two masked tokens?

  • Consequences of Independent Predictions in Language Models

  • Permuted Language Modeling (PLM)