Concept

Potential for Undesirable Content Generation After SFT

Even after undergoing pre-training and supervised fine-tuning (SFT), a Large Language Model may still produce outputs that are unfactual, biased, or harmful when responding to user prompts. This limitation of SFT necessitates further alignment steps to ensure the model's behavior is consistently safe and helpful.

0

1

Updated 2026-04-20

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related