1Cademy - A research lab is developing a new large-scale language model. They have access to a state-of-the-art neural architecture designed to effectively process long sequences of text. They are debating between two training strategies: 1. **Strategy A:** Train the model from scratch on a high-quality, human-labeled dataset of 1 million examples specifically designed for question-answering. 2. **Strategy B:** First, train the model on a massive, unlabeled corpus of 1 trillion words from the internet with the objective of predicting the next word in a sentence. Then, optionally, adapt it to specific tasks. Which strategy is more likely to produce a powerful, general-purpose model capable of a wide range of language understanding and generation tasks, and why?

Learn Before

Synergy of Transformers and Self-Supervised Learning

Multiple Choice

A research lab is developing a new large-scale language model. They have access to a state-of-the-art neural architecture designed to effectively process long sequences of text. They are debating between two training strategies:

Strategy A: Train the model from scratch on a high-quality, human-labeled dataset of 1 million examples specifically designed for question-answering.
Strategy B: First, train the model on a massive, unlabeled corpus of 1 trillion words from the internet with the objective of predicting the next word in a sentence. Then, optionally, adapt it to specific tasks.

Which strategy is more likely to produce a powerful, general-purpose model capable of a wide range of language understanding and generation tasks, and why?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related