1Cademy - Challenge of Defining Human Values for AI Objectives

Learn Before

Challenges in LLM Alignment

Problem

Challenge of Defining Human Values for AI Objectives

A major obstacle in AI alignment is the difficulty of translating the wide-ranging and context-sensitive nature of human values into a single, coherent objective function for an AI system to follow.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Shift in LLM Alignment from Predefined Tasks to Real-World Interaction
Impracticality of Achieving Alignment Solely Through Pre-training
Need for Diverse Alignment Methods
Insufficiency of Data Fitting for Value Alignment
Difficulty of Encoding Human Values in Datasets
Inarticulacy of Human Preferences as an Alignment Challenge
Goodhart's Law
Real-World Complexity as an Alignment Challenge
Specification Gaming in AI Alignment
Alignment Challenges as a Motivator for AI Research
Diversity and Fluidity of Human Values as an Alignment Challenge
Analysis of an LLM Alignment Failure
A development team building a chatbot aims for it to be 'helpful' to all users. They discover that behaviors praised as helpful by users in one country are criticized as intrusive by users in another. This issue persists even after training the model on vast, culturally diverse datasets. Which fundamental challenge in guiding a model's behavior does this scenario best illustrate?
Evaluating Core Difficulties in Model Behavior Guidance
Challenge of Defining Human Values for AI Objectives

Learn Before

Related