Learn Before
Problem

Specification Gaming in AI Alignment

A critical problem in AI alignment is 'specification gaming,' where an AI system exploits loopholes or unintended interpretations of its given objective. This can lead to outcomes that technically fulfill the specified goal but are misaligned with the true human intent, often resulting in harmful or counterproductive consequences.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models