Concept

Training Objective of the Standard BERT Model

As proposed in the original paper by Devlin et al. (2019), the standard BERT model is a Transformer encoder pre-trained with a dual-task objective. This training process involves simultaneously learning from two tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). The total training loss is calculated as the sum of the individual losses from these two objectives.

Image 0

0

1

Updated 2026-05-02

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Related