Knowledge Acquisition in LLMs through Scaled Token Prediction
While next-token prediction is a simple objective, performing this task repeatedly on a massive scale enables large language models to acquire a broad, general understanding of language. This emergent capability goes beyond simple language modeling, forming the basis of their advanced performance.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Knowledge Acquisition in LLMs through Scaled Token Prediction
A language model is processing the sequence: 'The first three letters of the alphabet are A, B,'. Based on its fundamental training objective, what is the model's immediate goal at this exact moment?
The Power of a Simple Objective
Analyzing Model Behavior
Learn After
A common observation is that large language models, despite being trained only to predict the next token in a sequence, can perform tasks that seem to require genuine world knowledge. What is the primary reason for this emergent capability?
The Emergence of Knowledge from a Simple Objective
A large language model's ability to answer factual questions about history is a direct result of a separate training phase focused specifically on memorizing historical facts, distinct from its primary language modeling task.