1Cademy - Surrogate Objectives in AI Alignment

Learn Before

Fundamental Approaches to LLM Alignment

Concept

Surrogate Objectives in AI Alignment

A common strategy in AI alignment involves creating a 'surrogate objective,' which is a measurable, proxy goal designed to approximate the true, often more complex, intended objective. The AI system is then trained to optimize for this surrogate.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Evaluating Surrogate Objectives for a News-Summarizing AI
A development team is training an AI to write helpful and engaging online tutorials. The true, complex objective is to 'create content that effectively teaches users a new skill.' To make this measurable, the team chooses a surrogate objective: 'maximize the word count of the tutorial and the number of technical terms used.' Which of the following outcomes is the most likely form of misalignment to result from this choice?
Evaluating Surrogate Objectives for a Mental Well-being AI

Learn Before

Related

Learn After