A development team aims to align a large language model with the complex value of 'being helpful'. Their strategy is to create a high-quality dataset of 50,000 question-and-answer pairs where the model's response is rated as 'very helpful' by human annotators. They then fine-tune the model with the sole objective of maximizing its ability to reproduce these exact 'very helpful' answers. Which statement best evaluates the fundamental limitation of this data-fitting approach for achieving the team's goal?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team aims to align a large language model with the complex value of 'being helpful'. Their strategy is to create a high-quality dataset of 50,000 question-and-answer pairs where the model's response is rated as 'very helpful' by human annotators. They then fine-tune the model with the sole objective of maximizing its ability to reproduce these exact 'very helpful' answers. Which statement best evaluates the fundamental limitation of this data-fitting approach for achieving the team's goal?
Analysis of an LLM Alignment Failure
Limitations of Supervised Fine-Tuning for Value Alignment