Conflicting Evaluation Criteria in AI Feedback
A human evaluator is comparing two AI-generated responses to the prompt: 'Explain the water cycle for a 10-year-old.'
- Response A is scientifically precise and comprehensive but uses complex vocabulary that a 10-year-old would find difficult to understand.
- Response B uses simple language and analogies, making it very easy for a 10-year-old to grasp, but it slightly oversimplifies one technical detail.
Which response should the evaluator likely prefer, and why? Analyze the trade-off between the evaluation criteria of accuracy and clarity/relevance in your answer.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Competing AI Responses
A human evaluator is comparing pairs of AI-generated responses for two different user requests. Request 1 asks for a factual summary of a specific scientific process. Request 2 asks for a creative and engaging short story. How should the evaluator's focus on different quality criteria shift between these two tasks?
Conflicting Evaluation Criteria in AI Feedback
A human evaluator is reviewing several pairs of AI-generated responses to a user's prompt. Below are descriptions of flaws found in some of the less-preferred responses. Match each flaw description to the primary evaluation criterion it violates.