1Cademy - A research team is training a language model to solve multi-step physics problems. The model is trained on a dataset of problems and their final numerical answers. The training process provides a positive reward only if the models final answer is correct. After extensive training, the model still struggles, often making logical errors in the intermediate steps of its reasoning. Which of the following best explains the fundamental flaw in this training approach?

Learn Before

Importance of Step-by-Step Supervision for Complex LLM Reasoning Tasks

Multiple Choice

A research team is training a language model to solve multi-step physics problems. The model is trained on a dataset of problems and their final numerical answers. The training process provides a positive reward only if the model's final answer is correct. After extensive training, the model still struggles, often making logical errors in the intermediate steps of its reasoning. Which of the following best explains the fundamental flaw in this training approach?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related