Learn Before
  • Training Objective of the Standard BERT Model

BERT Loss Function

The total training loss for the BERT model is calculated by summing the individual losses from its two pre-training objectives: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). The formula is expressed as: LossBERT=LossMLM+LossNSPLoss_{BERT} = Loss_{MLM} + Loss_{NSP}.

Image 0

0

1

6 months ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • BERT Loss Function

  • Concurrent Loss Calculation for MLM and NSP

  • A researcher is pre-training a large language model using a dual-task objective. The model is simultaneously trained on two tasks:

    1. Predicting randomly obscured words within a given text.
    2. Determining if two text segments presented together originally appeared consecutively. The final training update is based on the model's combined performance on both tasks. Which of the following statements best analyzes the primary advantage of this specific dual-task approach?
  • Evaluating a Modified Pre-training Strategy

  • The original pre-training process for the Bidirectional Encoder Representations from Transformers model involves a dual-task objective where the total loss is the sum of the losses from two distinct tasks. Match each training task to its corresponding description.

Learn After
  • BERT Training Process

  • An engineer is pre-training a language model that simultaneously learns to predict masked words in a sentence and to determine if two sentences are consecutive. In a single training step, the loss for the masked word prediction task is calculated as 1.8, and the loss for the sentence relationship task is 0.6. What is the total loss value that will be used to update the model's parameters for this step?

  • Analyzing Language Model Training Loss

  • Analyzing Dual-Task Model Training Performance