1Cademy - The main contribution of the model that popularized deep bidirectional pre-training was its success in showing that optimal performance on different natural language processing tasks, such as question answering and sentiment analysis, requires building entirely separate, highly-engineered model architectures for each specific task.

Approach A: The model is trained to predict the next word in a sequence, having only seen the words that came before it.
Approach B: The model is trained to predict a randomly hidden word in a sequence, using all other words in the sequence (both before and after the hidden word) as context.

Learn Before

BERT's Contributions and Impact

True/False

The main contribution of the model that popularized deep bidirectional pre-training was its success in showing that optimal performance on different natural language processing tasks, such as question answering and sentiment analysis, requires building entirely separate, highly-engineered model architectures for each specific task.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related