Concept

Outcome-based Approaches for LLM Fine-Tuning

In outcome-based approaches to LLM fine-tuning, supervision is applied exclusively when the end result is verified. The model is optimized to maximize some form of reward, such as r(x,y){}r(\mathbf{x},\mathbf{y}), based on the final outcome. This represents a standard methodology for learning from human feedback, where evaluation focuses on the complete input-output sequence rather than the intermediate steps.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models