An AI research team is pre-training a large language model. They design a process where the model is simultaneously optimized on two distinct tasks: 1) predicting randomly hidden words within a sentence, and 2) determining if two sentences presented together originally appeared in sequence in the source text. What is the most likely reason for this dual-task training approach?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI research team is pre-training a large language model. They design a process where the model is simultaneously optimized on two distinct tasks: 1) predicting randomly hidden words within a sentence, and 2) determining if two sentences presented together originally appeared in sequence in the source text. What is the most likely reason for this dual-task training approach?
Differentiating Contributions of Pre-training Objectives
Diagnosing a Language Model's Training Deficiency