1Cademy - Outcome-based Approaches for LLM Fine-Tuning

Learn Before

Classification of LLM Fine-Tuning Approaches for Reasoning Tasks

Concept

Outcome-based Approaches for LLM Fine-Tuning

In outcome-based approaches to LLM fine-tuning, supervision is applied exclusively when the end result is verified. The model is optimized to maximize some form of reward, such as ${}r(\mathbf{x},\mathbf{y})$ , based on the final outcome. This represents a standard methodology for learning from human feedback, where evaluation focuses on the complete input-output sequence rather than the intermediate steps.