Concept

Concurrent Loss Calculation for MLM and NSP

In BERT's pre-training phase, a single input sequence, which may be a pair of sentences, serves as the basis for calculating the losses for both Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). The two loss values are computed independently from this shared input before being combined for the model update.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences