1Cademy - Classification of LLM Fine-Tuning Approaches for Reasoning Tasks

Learn Before

Classification of Instruction Fine-Tuning as an Alignment Problem

Classification

Classification of LLM Fine-Tuning Approaches for Reasoning Tasks

For reasoning tasks where a Large Language Model generates a sequence of steps leading to a verifiable final answer, the fine-tuning methods can be grouped into two main categories, as identified by Uesato et al. (2022).

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Outcome-based Approaches for LLM Fine-Tuning
Process-based Approaches for LLM Fine-Tuning
A team is fine-tuning a large language model to solve complex, multi-step logic puzzles. They are testing two different supervisory approaches:
- Approach 1: The model generates the full sequence of reasoning steps and provides a final answer. A human evaluator then checks only if the final answer is correct. The model receives a positive signal if the answer is correct and a negative signal if it is incorrect, regardless of the reasoning steps.
- Approach 2: The model generates its reasoning one step at a time. After each step, a human evaluator checks if that individual step is logically sound and correctly follows from the previous ones. The model receives a supervisory signal for each intermediate step in its reasoning chain.
What is the fundamental difference in how supervision is applied in these two approaches?
Recommending a Fine-Tuning Strategy for an AI Algebra Tutor
A team is fine-tuning a large language model for multi-step reasoning tasks. They are considering two general approaches for providing supervision: one that focuses only on the final answer, and one that evaluates each step of the reasoning process. Classify each of the following scenarios or characteristics by matching it to the correct supervisory approach.

Learn Before

Related

Learn After