1Cademy - An engineer is pre-training a language model that simultaneously learns to predict masked words in a sentence and to determine if two sentences are consecutive. In a single training step, the loss for the masked word prediction task is calculated as 1.8, and the loss for the sentence relationship task is 0.6. What is the total loss value that will be used to update the models parameters for this step?

Learn Before

BERT Loss Function

Multiple Choice

An engineer is pre-training a language model that simultaneously learns to predict masked words in a sentence and to determine if two sentences are consecutive. In a single training step, the loss for the masked word prediction task is calculated as 1.8, and the loss for the sentence relationship task is 0.6. What is the total loss value that will be used to update the model's parameters for this step?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related