Balancing Generalization and Specialization
A team is developing a large language model. During the initial training phase on a massive, diverse dataset, they focus on an objective that encourages the model to predict the next word in a sentence. Later, for a specific customer, they further train the model on a small, curated dataset of legal documents to improve its ability to answer legal questions. Explain how these two distinct training stages address the two fundamental challenges of the pre-training paradigm.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analyzing a Model Development Lifecycle
A research lab is developing a new foundation model with a limited computational budget. They are considering two primary approaches for the initial training phase:
- Approach 1: Train the model on an extremely large and diverse dataset, incorporating text from the web, academic articles, books, and code, using a general-purpose learning objective.
- Approach 2: Train the model on a smaller, but very high-quality, curated dataset focused on a few key domains (e.g., customer service and technical support dialogues) and then immediately test its performance on tasks within those domains.
Which statement best analyzes the fundamental trade-off between these two approaches in the context of building a foundation model?
Balancing Generalization and Specialization