Learn Before
Supervised Fine-Tuning (SFT)
Supervised Fine-Tuning (SFT) is a direct method for adapting pre-trained Large Language Models to follow instructions by training them on a dataset of annotated input-output pairs. In contrast to the pre-training objective of maximizing the probability of an entire sequence, SFT's goal is to maximize the conditional probability of generating the correct output given the input prefix. This process, formalized as Maximum Likelihood Estimation (MLE), teaches the model to produce the desired 'gold-standard' response for a given instruction.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
Instruction-Following Ability in LLMs
Supervised Fine-Tuning (SFT)
Instruction Data Generation and Collection
Generalization in Instruction Alignment
Suitability of Instruction Fine-Tuning for Well-Defined Tasks
An AI developer provides the exact same input to two different large language models. Model A is a base model trained solely to predict the next word in a sequence. Model B is the same base model but has undergone an additional tuning process.
Input given to both models: "Instruction: Summarize the following paragraph in exactly one sentence. Paragraph: The process of photosynthesis allows plants to convert light energy into chemical energy. This chemical energy is stored in the form of glucose, which serves as the primary source of food for the plant. During this process, carbon dioxide is absorbed from the atmosphere and oxygen is released as a byproduct, which is essential for most life on Earth."
Model A's Output: "This process is crucial for maintaining the balance of gases in our planet's atmosphere and provides the foundation for nearly all terrestrial ecosystems."
Model B's Output: "Photosynthesis is the process where plants use light energy to create their own food, converting carbon dioxide into oxygen as a byproduct."
Based on these outputs, which statement provides the most accurate analysis of the models' behaviors?
Diagnosing and Correcting LLM Behavior
Supervised Fine-Tuning (SFT) as an Example of Labeled Data Fine-Tuning
An AI development team is creating a dataset to fine-tune a pre-trained language model, aiming to improve its ability to follow user commands. Which of the following instruction-response pairs represents the highest-quality data point for this specific purpose?
Learn After
A team is fine-tuning a pre-trained language model using a dataset of high-quality instruction-response pairs. The training process aims to adjust the model's parameters to maximize the probability of it generating the exact target response for each given instruction. After training, the team observes that the model often produces responses that are factually correct but much shorter and less detailed than the high-quality examples in their dataset. What is the most likely reason for this behavior, given the training objective?
Comparing Training Objectives for Model Adaptation
Adapting a General Model for a Specialized Task