Diagnosing a Mismatch in Automated Selection
An AI email assistant is designed to generate three draft replies to a customer's complaint and then automatically select the best one. The selection is based on choosing the candidate with the highest score from an internal evaluation model, r. For a specific complaint, x, the assistant generates the following drafts and scores:
- Draft ŷ₁ (Polite but generic):
r(x, ŷ₁)= 0.92 - Draft ŷ₂ (Empathetic and offers a specific solution):
r(x, ŷ₂)= 0.88 - Draft ŷ₃ (Dismissive):
r(x, ŷ₃)= 0.31
A human manager reviews the drafts and determines that Draft ŷ₂ is by far the best response for customer satisfaction. However, the system selects Draft ŷ₁. Based on the selection process described, what is the most likely reason for this discrepancy?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Argmax Formula for Best Candidate Selection in BoN Sampling
A system generates four candidate outputs in response to a user's prompt. A separate evaluation model then assigns a quality score to each candidate, where a higher score indicates a better response. The system's selection rule is to choose the candidate that receives the maximum score. Given the scores below, which candidate will be selected as the final output?
- Candidate A: Score = 0.85
- Candidate B: Score = 0.91
- Candidate C: Score = 0.74
- Candidate D: Score = 0.23
Consider the process of selecting the best output from a set of N candidates, where a reward model
rscores each candidateŷ_ibased on an inputx. The selection is represented by the formula:ŷ_best = max{r(x, ŷ_1), ..., r(x, ŷ_N)}. This formula implies that the final output,ŷ_best, is a numerical value representing the highest score.Diagnosing a Mismatch in Automated Selection