Learn Before
The main contribution of the model that popularized deep bidirectional pre-training was its success in showing that optimal performance on different natural language processing tasks, such as question answering and sentiment analysis, requires building entirely separate, highly-engineered model architectures for each specific task.
0
1
Tags
Data Science
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team is developing a new language model. They are debating between two pre-training approaches:
- Approach A: The model is trained to predict the next word in a sequence, having only seen the words that came before it.
- Approach B: The model is trained to predict a randomly hidden word in a sequence, using all other words in the sequence (both before and after the hidden word) as context.
Based on the key innovations that led to significant performance improvements across a wide range of natural language processing tasks, which approach is more likely to produce a powerful, general-purpose language representation, and why?
Evaluating the Architectural Impact of Bidirectional Pre-training
The main contribution of the model that popularized deep bidirectional pre-training was its success in showing that optimal performance on different natural language processing tasks, such as question answering and sentiment analysis, requires building entirely separate, highly-engineered model architectures for each specific task.