1Cademy - A research team is developing a new language model. They are debating between two pre-training approaches: * **Approach A:** The model is trained to predict the next word in a sequence, having only seen the words that came before it. * **Approach B:** The model is trained to predict a randomly hidden word in a sequence, using all other words in the sequence (both before and after the hidden word) as context. Based on the key innovations that led to significant performance improvements across a wide range of natural language processing tasks, which approach is more likely to produce a powerful, general-purpose language representation, and why?

Learn Before

BERT's Contributions and Impact

Multiple Choice

A research team is developing a new language model. They are debating between two pre-training approaches:

Approach A: The model is trained to predict the next word in a sequence, having only seen the words that came before it.
Approach B: The model is trained to predict a randomly hidden word in a sequence, using all other words in the sequence (both before and after the hidden word) as context.

Based on the key innovations that led to significant performance improvements across a wide range of natural language processing tasks, which approach is more likely to produce a powerful, general-purpose language representation, and why?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related