1Cademy - Concurrent Loss Calculation for MLM and NSP

Learn Before

Training Objective of the Standard BERT Model

Concept

Concurrent Loss Calculation for MLM and NSP

In BERT's pre-training phase, a single input sequence, which may be a pair of sentences, serves as the basis for calculating the losses for both Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). The two loss values are computed independently from this shared input before being combined for the model update.

Updated 2026-04-17

Contributors are: