Learn Before
Critique of Human Feedback for Model Alignment
A project lead for a new AI assistant states, 'Relying on human preference data is the only truly effective way to ensure our model is helpful and harmless. The associated challenges are simply the cost of doing good work and can be overcome with a sufficient budget.' Critique this statement. In your response, evaluate the validity of the project lead's position by discussing the full spectrum of limitations associated with this alignment approach.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
AI Feedback as an Alternative to Human Feedback
Evaluating an AI Alignment Strategy
A startup is aligning a new AI financial advisor using preference feedback. The data is collected exclusively from a small, culturally uniform group of the company's own financial experts. Based on the known challenges of this alignment method, what is the most critical potential flaw in this approach?
Critique of Human Feedback for Model Alignment