Evaluating Competing AI Responses
A human evaluator is tasked with choosing the better of two AI-generated responses to a prompt. The evaluator must consider criteria such as clarity, relevance, and accuracy. Read the prompt and the two responses below. Which response should the evaluator choose, and why? Justify your answer by analyzing how each response performs against the evaluation criteria.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Competing AI Responses
A human evaluator is comparing pairs of AI-generated responses for two different user requests. Request 1 asks for a factual summary of a specific scientific process. Request 2 asks for a creative and engaging short story. How should the evaluator's focus on different quality criteria shift between these two tasks?
Conflicting Evaluation Criteria in AI Feedback
A human evaluator is reviewing several pairs of AI-generated responses to a user's prompt. Below are descriptions of flaws found in some of the less-preferred responses. Match each flaw description to the primary evaluation criterion it violates.