A key step in gathering data for Reinforcement Learning from Human Feedback (RLHF) is to have a language model generate multiple, varied responses to a single prompt. Which of the following sets of responses to the prompt 'What are the benefits of regular exercise?' best exemplifies the desired diversity and quality for this data collection process?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating AI-Generated Responses
A key step in gathering data for Reinforcement Learning from Human Feedback (RLHF) is to have a language model generate multiple, varied responses to a single prompt. Which of the following sets of responses to the prompt 'What are the benefits of regular exercise?' best exemplifies the desired diversity and quality for this data collection process?
In a data collection process where a language model generates multiple outputs for a single prompt to be evaluated by humans, the model was given the prompt: 'How can I improve my public speaking skills?'. It produced the following four responses. What is the primary weakness of this set of responses for its intended purpose?
- Response A: Practice your speech in front of a mirror to get comfortable with the material.
- Response B: Rehearse your presentation multiple times to build confidence.
- Response C: Run through your talk several times before the actual event.
- Response D: Join a local public speaking club to get feedback and practice in a supportive environment.