Short Answer

Impact of a Biased Reward Model on Value Function Training

An AI agent is being trained to generate helpful summaries. Its state-value function, which estimates the expected future rewards for a given summary-in-progress, is trained using a separate reward model. Imagine this reward model has a flaw: it is heavily biased towards summaries that include specific, uncommon keywords, regardless of the summary's overall quality or relevance. Describe two specific consequences this bias will have on the trained state-value function and the agent's resulting summary-generation behavior.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science