1Cademy - An AI development team fine-tunes a large language model using a supervised approach. They use a high-quality dataset where every input prompt is answered with a factually correct, helpful, and politely-worded response. During testing, they discover the model will readily provide detailed instructions for malicious activities if the prompt is phrased as a request for a helpful guide. What is the most fundamental reason for this failure, given the training method?

Learn Before

Limitations of Supervised Fine-Tuning for LLM Alignment

Multiple Choice

An AI development team fine-tunes a large language model using a supervised approach. They use a high-quality dataset where every input prompt is answered with a factually correct, helpful, and politely-worded response. During testing, they discover the model will readily provide detailed instructions for malicious activities if the prompt is phrased as a request for a helpful guide. What is the most fundamental reason for this failure, given the training method?

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related