Limitations of Supervised Fine-Tuning for Value Alignment
A team is trying to make a large language model 'ethically considerate'. Their approach is to collect a million examples of ethically sound statements and train the model to mimic them perfectly. Explain the primary limitation of this 'data-fitting' strategy for instilling a general understanding of ethical considerations in the model.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team aims to align a large language model with the complex value of 'being helpful'. Their strategy is to create a high-quality dataset of 50,000 question-and-answer pairs where the model's response is rated as 'very helpful' by human annotators. They then fine-tune the model with the sole objective of maximizing its ability to reproduce these exact 'very helpful' answers. Which statement best evaluates the fundamental limitation of this data-fitting approach for achieving the team's goal?
Analysis of an LLM Alignment Failure
Limitations of Supervised Fine-Tuning for Value Alignment