Learn Before
Analyzing Dual-Task Model Training Performance
An engineer is training a large language model using a dual-task objective. The total training loss is the sum of the losses from two individual tasks: Task A (predicting randomly hidden words in a text) and Task B (determining if two sentences appear consecutively in the original text). Analyze the training log below and explain which task the model appears to be mastering more quickly. Justify your answer by referencing the trends in the loss values.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
BERT Training Process
An engineer is pre-training a language model that simultaneously learns to predict masked words in a sentence and to determine if two sentences are consecutive. In a single training step, the loss for the masked word prediction task is calculated as 1.8, and the loss for the sentence relationship task is 0.6. What is the total loss value that will be used to update the model's parameters for this step?
Analyzing Language Model Training Loss
Analyzing Dual-Task Model Training Performance