1Cademy - A team is fine-tuning a large language model to solve complex, multi-step logic puzzles. They are testing two different supervisory approaches: - **Approach 1:** The model generates the full sequence of reasoning steps and provides a final answer. A human evaluator then checks only if the final answer is correct. The model receives a positive signal if the answer is correct and a negative signal if it is incorrect, regardless of the reasoning steps. - **Approach 2:** The model generates its reasoning one step at a time. After each step, a human evaluator checks if that individual step is logically sound and correctly follows from the previous ones. The model receives a supervisory signal for each intermediate step in its reasoning chain. What is the fundamental difference in how supervision is applied in these two approaches?

Multiple Choice

A team is fine-tuning a large language model to solve complex, multi-step logic puzzles. They are testing two different supervisory approaches:

Approach 1: The model generates the full sequence of reasoning steps and provides a final answer. A human evaluator then checks only if the final answer is correct. The model receives a positive signal if the answer is correct and a negative signal if it is incorrect, regardless of the reasoning steps.
Approach 2: The model generates its reasoning one step at a time. After each step, a human evaluator checks if that individual step is logically sound and correctly follows from the previous ones. The model receives a supervisory signal for each intermediate step in its reasoning chain.

What is the fundamental difference in how supervision is applied in these two approaches?

0

1

Updated 2025-09-28

Contributors are:

Who are from: