Case Study

Diagnosing a Mismatch in Automated Selection

An AI email assistant is designed to generate three draft replies to a customer's complaint and then automatically select the best one. The selection is based on choosing the candidate with the highest score from an internal evaluation model, r. For a specific complaint, x, the assistant generates the following drafts and scores:

  • Draft ŷ₁ (Polite but generic): r(x, ŷ₁) = 0.92
  • Draft ŷ₂ (Empathetic and offers a specific solution): r(x, ŷ₂) = 0.88
  • Draft ŷ₃ (Dismissive): r(x, ŷ₃) = 0.31

A human manager reviews the drafts and determines that Draft ŷ₂ is by far the best response for customer satisfaction. However, the system selects Draft ŷ₁. Based on the selection process described, what is the most likely reason for this discrepancy?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science