Evaluating a Researcher's Conclusion on Model Training
A machine learning researcher pre-trains a large language model on a vast dataset of web text. They observe that the model excels at predicting the next word in a sequence but fails to follow simple instructions, such as 'Write a poem about a robot.' The researcher concludes, 'The pre-training phase only teaches the model statistical patterns of language, not any real capabilities for following instructions. These abilities must be built entirely from scratch during a subsequent instruction-tuning phase.'
Evaluate the researcher's conclusion. Is it fully correct, partially correct, or incorrect? Justify your answer based on the principles of how capabilities are developed in large language models.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team develops a large language model by training it on a massive corpus of text from the internet. When they give the model the instruction, 'Translate the following English sentence to French,' the model instead continues the sentence in English with a grammatically correct but irrelevant phrase. However, after a second, much shorter training phase using a small, curated dataset of English-to-French sentence pairs, the model correctly performs the translation task. Which of the following statements best explains this change in the model's behavior?
Fine-Tuning as a Mechanism for Activating Pre-Trained Knowledge
Evaluating a Researcher's Conclusion on Model Training
The primary purpose of the supervisory phase that follows pre-training is to introduce entirely new capabilities, such as the ability to summarize text, which the model did not acquire in any form during its initial, large-scale training.