Analysis of an LLM Alignment Failure
Based on the following scenario, analyze the fundamental flaw in the team's alignment strategy and explain why it resulted in a model that fails to generalize.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team aims to align a large language model with the complex value of 'being helpful'. Their strategy is to create a high-quality dataset of 50,000 question-and-answer pairs where the model's response is rated as 'very helpful' by human annotators. They then fine-tune the model with the sole objective of maximizing its ability to reproduce these exact 'very helpful' answers. Which statement best evaluates the fundamental limitation of this data-fitting approach for achieving the team's goal?
Analysis of an LLM Alignment Failure
Limitations of Supervised Fine-Tuning for Value Alignment