1Cademy - Insufficiency of Outcome-Based Rewards for Complex Reasoning

Learn Before

Outcome-Based Reward Models

Concept

Insufficiency of Outcome-Based Rewards for Complex Reasoning

For tasks that require complex reasoning, reward models that only evaluate the correctness of the final output are insufficient for effective learning. This is because such outcome-based feedback does not provide information about errors made during the reasoning process, thus failing to guide the model on how to improve its problem-solving steps.

Updated 2025-10-07

Contributors are: