Explaining Emergent Zero-Shot Abilities
A research team has just finished the pre-training phase for a new large language model, using a massive corpus of text and code from the public internet. Before beginning any instruction fine-tuning, a researcher tests the model with the following prompt:
`Text: 'The sun is a star at the center of the Solar System. It is a nearly perfect sphere of hot plasma, with internal convective motion that generates a magnetic field via a dynamo process.'
Summarize the preceding text in one sentence.`
To their surprise, the model responds with: 'The sun is a plasma star at the center of our solar system that generates a magnetic field.'
Based on the principles of how models learn during pre-training, provide the most likely explanation for why the model was able to perform this summarization task without any explicit fine-tuning.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Challenge of Opaque Pre-Training Data in Fine-Tuning
A machine learning engineer claims, "A language model's ability to follow instructions is exclusively a result of the targeted examples shown during its fine-tuning stage. The pre-training phase only provides it with general world knowledge and language structure."
Which of the following statements provides the most accurate evaluation of this claim?
Explaining Unexpected Model Capabilities
Explaining Emergent Zero-Shot Abilities