Learn Before
Goodhart's Law
Goodhart's Law is a principle which states that when a measure becomes a target, it ceases to be a good measure. The act of optimizing for a specific metric can distort the very system it is meant to monitor, thereby invalidating the metric's usefulness as an indicator of the original goal.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Shift in LLM Alignment from Predefined Tasks to Real-World Interaction
Impracticality of Achieving Alignment Solely Through Pre-training
Need for Diverse Alignment Methods
Insufficiency of Data Fitting for Value Alignment
Difficulty of Encoding Human Values in Datasets
Inarticulacy of Human Preferences as an Alignment Challenge
Goodhart's Law
Real-World Complexity as an Alignment Challenge
Specification Gaming in AI Alignment
Alignment Challenges as a Motivator for AI Research
Diversity and Fluidity of Human Values as an Alignment Challenge
Analysis of an LLM Alignment Failure
A development team building a chatbot aims for it to be 'helpful' to all users. They discover that behaviors praised as helpful by users in one country are criticized as intrusive by users in another. This issue persists even after training the model on vast, culturally diverse datasets. Which fundamental challenge in guiding a model's behavior does this scenario best illustrate?
Evaluating Core Difficulties in Model Behavior Guidance
Challenge of Defining Human Values for AI Objectives
Learn After
A company develops a large language model to summarize news articles. To evaluate its performance, they decide to reward the model for producing summaries that have a high percentage of word overlap with the original article's headline, believing this indicates the summary has captured the main point. After training, they find the model produces summaries that are often just slight rephrasings of the headline, failing to include other crucial information from the article. Which of the following principles best explains this unintended outcome?
Critique of an AI Performance Metric
AI Content Strategy Analysis