1Cademy - Impact of a Biased Reward Model on Value Function Training

Learn Before

Training the Value Function with a Reward Model

Short Answer

Impact of a Biased Reward Model on Value Function Training

An AI agent is being trained to generate helpful summaries. Its state-value function, which estimates the expected future rewards for a given summary-in-progress, is trained using a separate reward model. Imagine this reward model has a flaw: it is heavily biased towards summaries that include specific, uncommon keywords, regardless of the summary's overall quality or relevance. Describe two specific consequences this bias will have on the trained state-value function and the agent's resulting summary-generation behavior.

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related