1Cademy - Analyzing Ambiguous AI Training Objectives

Learn Before

Challenges in Defining Human Preferences for LLM Alignment

Essay

Analyzing Ambiguous AI Training Objectives

A city council wants to use a language model to summarize thousands of public comments on a controversial proposal to build a new factory. The model is instructed to 'prioritize the most helpful and constructive feedback' to guide the council's decision. The comments include passionate arguments from local residents about potential noise and pollution, detailed economic forecasts from business leaders about job creation, and urgent warnings from environmental scientists about ecosystem damage. Analyze why the instruction to prioritize 'helpful and constructive feedback' is a challenging objective. Identify and explain at least two distinct difficulties that stem from the ambiguity of human preferences in this context.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related