Multiple Choice

A research lab is developing a new large-scale language model. They have access to a state-of-the-art neural architecture designed to effectively process long sequences of text. They are debating between two training strategies:

  1. Strategy A: Train the model from scratch on a high-quality, human-labeled dataset of 1 million examples specifically designed for question-answering.
  2. Strategy B: First, train the model on a massive, unlabeled corpus of 1 trillion words from the internet with the objective of predicting the next word in a sentence. Then, optionally, adapt it to specific tasks.

Which strategy is more likely to produce a powerful, general-purpose model capable of a wide range of language understanding and generation tasks, and why?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science