A human evaluator is comparing pairs of AI-generated responses for two different user requests. Request 1 asks for a factual summary of a specific scientific process. Request 2 asks for a creative and engaging short story. How should the evaluator's focus on different quality criteria shift between these two tasks?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Competing AI Responses
A human evaluator is comparing pairs of AI-generated responses for two different user requests. Request 1 asks for a factual summary of a specific scientific process. Request 2 asks for a creative and engaging short story. How should the evaluator's focus on different quality criteria shift between these two tasks?
Conflicting Evaluation Criteria in AI Feedback
A human evaluator is reviewing several pairs of AI-generated responses to a user's prompt. Below are descriptions of flaws found in some of the less-preferred responses. Match each flaw description to the primary evaluation criterion it violates.