Comparison

Comparison of Objectives: Supervised Fine-Tuning vs. RLHF

Supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) represent two distinct methodologies for training large language models. In supervised fine-tuning, the language model is optimized by maximizing the probability of the prediction given the input. In contrast, RLHF first trains a reward model on human preference data, where evaluators select their preferred choice from pairs of model predictions. Then, this reward model is utilized to supervise the language model during the fine-tuning process by scoring newly generated outputs and updating the model parameters through reinforcement learning algorithms.

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related