Evaluating AI-Generated Responses
A user provides the following request to a conversational AI: 'What's a good, healthy, and quick breakfast idea for a busy weekday morning?' The AI generates the four responses listed below. Your task is to act as a human evaluator. Rank the responses from most helpful (1) to least helpful (4) and provide a brief justification for your ranking, explaining how well each response meets the user's specific needs.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating AI-Generated Responses
A key step in gathering data for Reinforcement Learning from Human Feedback (RLHF) is to have a language model generate multiple, varied responses to a single prompt. Which of the following sets of responses to the prompt 'What are the benefits of regular exercise?' best exemplifies the desired diversity and quality for this data collection process?
In a data collection process where a language model generates multiple outputs for a single prompt to be evaluated by humans, the model was given the prompt: 'How can I improve my public speaking skills?'. It produced the following four responses. What is the primary weakness of this set of responses for its intended purpose?
- Response A: Practice your speech in front of a mirror to get comfortable with the material.
- Response B: Rehearse your presentation multiple times to build confidence.
- Response C: Run through your talk several times before the actual event.
- Response D: Join a local public speaking club to get feedback and practice in a supportive environment.