Learn Before
Concurrent Loss Calculation for MLM and NSP
In BERT's pre-training phase, a single input sequence, which may be a pair of sentences, serves as the basis for calculating the losses for both Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). The two loss values are computed independently from this shared input before being combined for the model update.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
BERT Loss Function
Concurrent Loss Calculation for MLM and NSP
A researcher is pre-training a large language model using a dual-task objective. The model is simultaneously trained on two tasks:
- Predicting randomly obscured words within a given text.
- Determining if two text segments presented together originally appeared consecutively. The final training update is based on the model's combined performance on both tasks. Which of the following statements best analyzes the primary advantage of this specific dual-task approach?
Evaluating a Modified Pre-training Strategy
The original pre-training process for the Bidirectional Encoder Representations from Transformers model involves a dual-task objective where the total loss is the sum of the losses from two distinct tasks. Match each training task to its corresponding description.
Learn After
In a specific pre-training setup for a language model, a single input (composed of one or two sentences) is used to perform two distinct tasks simultaneously: one task involves predicting words that have been intentionally hidden in the text, and the other involves determining the relationship between the two sentences (e.g., if one follows the other). Which statement accurately describes how the performance on these two tasks is used to update the model?
Consider a language model pre-training process that uses a single input sequence (e.g., a pair of sentences) to perform two tasks: predicting masked words and determining if the second sentence logically follows the first. In this process, the model first calculates the loss for the masked word task and updates its internal parameters. Then, using the same input, it calculates the loss for the sentence relationship task and performs a second, separate update to its parameters.
A language model is being pre-trained using a dual-task objective on a single input sequence composed of two sentences. One task is to predict masked words within the sentences, and the other is to predict if the second sentence is the actual next sentence. Arrange the following steps in the correct computational order for a single training iteration.