Formula

BERT Loss Function

The total training loss for a standard BERT model is calculated by summing the individual losses from its two pre-training tasks: masked language modeling (MLM) and next sentence prediction (NSP). The formula is expressed as: LossBERT=LossMLM+LossNSP\mathrm{Loss}_{\mathrm{BERT}} = \mathrm{Loss}_{\mathrm{MLM}} + \mathrm{Loss}_{\mathrm{NSP}}.

Image 0

0

1

Updated 2026-05-26

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L