1Cademy - Potential for Undesirable Content Generation After SFT

Learn Before

Supervised Fine-Tuning (SFT) as an Example of Labeled Data Fine-Tuning

Concept

Potential for Undesirable Content Generation After SFT

Even after undergoing pre-training and supervised fine-tuning (SFT), a Large Language Model may still produce outputs that are unfactual, biased, or harmful when responding to user prompts. This limitation of SFT necessitates further alignment steps to ensure the model's behavior is consistently safe and helpful.

Updated 2026-04-20

Contributors are: