Critiquing the 'Perfect Dataset' Hypothesis for Alignment
An AI research group argues that the key to creating a perfectly aligned language model is to build a 'gold standard' pre-training dataset. They propose a multi-year project to collect and filter text that exclusively represents ideal, helpful, and harmless human interactions. They claim that a model trained only on this dataset would not require any subsequent alignment tuning. Critique this argument by identifying and explaining the two main practical challenges that make this 'pre-training only' approach unfeasible.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Necessity of Post-Pre-training Alignment
Evaluating a Pre-training-Only Strategy
A research lab proposes a new strategy to create a perfectly helpful and harmless language model. Their plan is to spend five years meticulously curating a massive dataset of text and code that only contains examples of positive, safe, and beneficial interactions. They argue that by pre-training a model exclusively on this 'perfect' dataset, no further alignment steps will be necessary. Which of the following statements identifies the most critical flaw in this strategy's approach to alignment?
Critiquing the 'Perfect Dataset' Hypothesis for Alignment