Learn Before
Concept

Insufficiency of Outcome-Based Rewards for Complex Reasoning

For tasks that require complex reasoning, reward models that only evaluate the correctness of the final output are insufficient for effective learning. This is because such outcome-based feedback does not provide information about errors made during the reasoning process, thus failing to guide the model on how to improve its problem-solving steps.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences