1Cademy - BERT Loss Function

Learn Before

Training Objective of the Standard BERT Model

Formula

BERT Loss Function

The total training loss for a standard BERT model is calculated by summing the individual losses from its two pre-training tasks: masked language modeling (MLM) and next sentence prediction (NSP). The formula is expressed as: $\mathrm{Loss}_{\mathrm{BERT}} = \mathrm{Loss}_{\mathrm{MLM}} + \mathrm{Loss}_{\mathrm{NSP}}$ .