1Cademy - Analyzing a Reward Systems Weakness

Learn Before

Outcome Reward Models

Short Answer

Analyzing a Reward System's Weakness

A language model is being trained to solve multi-step math word problems. The training system only checks if the model's final numerical answer is correct. If the final answer is right, the model gets a positive reward; otherwise, it gets a negative reward. Describe a significant potential weakness of this training approach regarding the model's actual problem-solving capabilities.

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related