1Cademy - An AI team is training a model to solve complex, multi-step mathematical word problems. They are considering two different methods for providing feedback during training:<br><br>Method 1: The model generates the entire step-by-step solution and the final answer. It only receives a positive reward if the final numerical answer is correct.<br><br>Method 2: The model generates the solution one step at a time. It receives a positive reward for each individual step that is logically correct and follows from the

Learn Before

Importance of Step-by-Step Supervision for Complex Reasoning

Multiple Choice

An AI team is training a model to solve complex, multi-step mathematical word problems. They are considering two different methods for providing feedback during training:

Method 1: The model generates the entire step-by-step solution and the final answer. It only receives a positive reward if the final numerical answer is correct.

Method 2: The model generates the solution one step at a time. It receives a positive reward for each individual step that is logically correct and follows from the

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related