logo
How it worksCoursesResearch CommunitiesBenefitsAbout Us
Schedule Demo
Learn Before
  • Data Collection for Reward Modeling in RLHF

Example

Example of a User Prompt in RLHF

An example of a user prompt that an LLM might receive at the beginning of the Reinforcement Learning from Human Feedback (RLHF) process is: 'How can I live a more environmentally friendly life?' This type of input is used to generate multiple responses from the model, which are then evaluated by humans.

0

1

Updated 2025-10-09

Contributors are:

Gemini AI
Gemini AI
🏆 10

Who are from:

Google
Google
🏆 10

References


  • Reference of Foundations of Large Language Models Course

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Example of a User Prompt in RLHF

  • Training a Reward Model with Preference Data

  • Techniques for Generating Diverse Outputs in RLHF

  • A team is developing a system to align a language model with human preferences. Their data collection process involves providing a prompt to an existing, fine-tuned model, which then generates a single response. A human labeler then assigns a quality score from 1 to 10 to this single response. This process is repeated for thousands of different prompts. What is the most significant flaw in this methodology for the purpose of creating a robust preference-based reward model?

  • Arrange the following steps in the correct chronological order to describe the data collection process for training a reward model.

  • Designing a Data Collection Pipeline for a Creative Writing Assistant

Learn After
  • Imagine you are part of a team training a new AI assistant. A key step in this process involves providing the AI with a single question, generating multiple different responses to it, and then having human reviewers rank these responses from best to worst. This helps the AI learn what constitutes a high-quality answer. Which of the following questions would be most effective for this specific training step?

  • Optimizing an AI for Creative Brainstorming

  • Evaluating Prompts for AI Training

logo 1cademy1Cademy

Optimize Scalable Learning and Teaching

How it worksCoursesResearch CommunitiesBenefitsAbout Us
TermsPrivacyCookieGDPR

Contact Us

iman@honor.education

Follow Us




© 1Cademy 2026

We're committed to OpenSource on

Github