Learn Before
Example of a User Prompt in RLHF
An example of a user prompt that an LLM might receive at the beginning of the Reinforcement Learning from Human Feedback (RLHF) process is: 'How can I live a more environmentally friendly life?' This type of input is used to generate multiple responses from the model, which are then evaluated by humans.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of a User Prompt in RLHF
Training a Reward Model with Preference Data
Techniques for Generating Diverse Outputs in RLHF
A team is developing a system to align a language model with human preferences. Their data collection process involves providing a prompt to an existing, fine-tuned model, which then generates a single response. A human labeler then assigns a quality score from 1 to 10 to this single response. This process is repeated for thousands of different prompts. What is the most significant flaw in this methodology for the purpose of creating a robust preference-based reward model?
Arrange the following steps in the correct chronological order to describe the data collection process for training a reward model.
Designing a Data Collection Pipeline for a Creative Writing Assistant
Learn After
Imagine you are part of a team training a new AI assistant. A key step in this process involves providing the AI with a single question, generating multiple different responses to it, and then having human reviewers rank these responses from best to worst. This helps the AI learn what constitutes a high-quality answer. Which of the following questions would be most effective for this specific training step?
Optimizing an AI for Creative Brainstorming
Evaluating Prompts for AI Training