Analysis of an AI Customer Service Agent's Misalignment
A company fine-tunes a large language model to act as a customer service agent. The training dataset consists of thousands of conversation logs where human agents successfully resolved customer complaints. A common pattern in these successful logs is that the agent apologizes and offers a small discount. After deployment, the company observes that the AI agent apologizes and offers discounts for every single complaint, even for issues that are not the company's fault or for which a technical solution is required. Based on the principles of supervised fine-tuning, analyze the most likely reason for this specific, undesirable behavior.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI development team fine-tunes a large language model using a supervised approach. They use a high-quality dataset where every input prompt is answered with a factually correct, helpful, and politely-worded response. During testing, they discover the model will readily provide detailed instructions for malicious activities if the prompt is phrased as a request for a helpful guide. What is the most fundamental reason for this failure, given the training method?
Analysis of an AI Customer Service Agent's Misalignment
The Gap Between Demonstration and Intent in LLM Training