1Cademy - Analysis of an LLM Alignment Failure

Learn Before

Insufficiency of Data Fitting for Complex Value Alignment

Case Study

Analysis of an LLM Alignment Failure

Based on the following scenario, analyze the fundamental flaw in the team's alignment strategy and explain why it resulted in a model that fails to generalize.

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A development team aims to align a large language model with the complex value of 'being helpful'. Their strategy is to create a high-quality dataset of 50,000 question-and-answer pairs where the model's response is rated as 'very helpful' by human annotators. They then fine-tune the model with the sole objective of maximizing its ability to reproduce these exact 'very helpful' answers. Which statement best evaluates the fundamental limitation of this data-fitting approach for achieving the team's goal?
Analysis of an LLM Alignment Failure
Limitations of Supervised Fine-Tuning for Value Alignment

Learn Before

Related