Learn Before
Multiple Choice

A research team is developing a new language model. They are debating between two pre-training approaches:

  • Approach A: The model is trained to predict the next word in a sequence, having only seen the words that came before it.
  • Approach B: The model is trained to predict a randomly hidden word in a sequence, using all other words in the sequence (both before and after the hidden word) as context.

Based on the key innovations that led to significant performance improvements across a wide range of natural language processing tasks, which approach is more likely to produce a powerful, general-purpose language representation, and why?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science