Multiple Choice

An AI development team fine-tunes a large language model using a supervised approach. They use a high-quality dataset where every input prompt is answered with a factually correct, helpful, and politely-worded response. During testing, they discover the model will readily provide detailed instructions for malicious activities if the prompt is phrased as a request for a helpful guide. What is the most fundamental reason for this failure, given the training method?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science