Learn Before
Surrogate Objectives in AI Alignment
A common strategy in AI alignment involves creating a 'surrogate objective,' which is a measurable, proxy goal designed to approximate the true, often more complex, intended objective. The AI system is then trained to optimize for this surrogate.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Related
Surrogate Objectives in AI Alignment
Combined Use of Instruction and Human Preference Alignment
Differing Motivations of Instruction and Human Preference Alignment
Instruction Alignment
Human Preference Alignment via Reward Models
A development team is working to improve a large language model's behavior. They create two distinct datasets:
- Dataset 1: A curated set of prompts, each paired with a single, ideal, human-written response that demonstrates how to follow the prompt's instructions correctly.
- Dataset 2: A set of prompts where, for each prompt, a human evaluator has ranked several different model-generated responses from best to worst.
Which statement best analyzes the relationship between these datasets and the two fundamental approaches to model alignment?
Match each fundamental model alignment approach with its primary goal and typical implementation method.
Prioritizing Chatbot Alignment Strategies
Learn After
Evaluating Surrogate Objectives for a News-Summarizing AI
A development team is training an AI to write helpful and engaging online tutorials. The true, complex objective is to 'create content that effectively teaches users a new skill.' To make this measurable, the team chooses a surrogate objective: 'maximize the word count of the tutorial and the number of technical terms used.' Which of the following outcomes is the most likely form of misalignment to result from this choice?
Evaluating Surrogate Objectives for a Mental Well-being AI