Learn Before
Designing a Data Collection Pipeline for a Creative Writing Assistant
You are an ML engineer at a startup building a creative writing assistant. To align the model with user preferences for 'helpful and imaginative' suggestions, you need to create a dataset for training a reward model. Describe a concrete, step-by-step process for this data collection phase. Your description should specify (1) what the language model will generate for a given input prompt (e.g., a story idea), and (2) what specific task the human labelers will perform with the generated content.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of a User Prompt in RLHF
Training a Reward Model with Preference Data
Techniques for Generating Diverse Outputs in RLHF
A team is developing a system to align a language model with human preferences. Their data collection process involves providing a prompt to an existing, fine-tuned model, which then generates a single response. A human labeler then assigns a quality score from 1 to 10 to this single response. This process is repeated for thousands of different prompts. What is the most significant flaw in this methodology for the purpose of creating a robust preference-based reward model?
Arrange the following steps in the correct chronological order to describe the data collection process for training a reward model.
Designing a Data Collection Pipeline for a Creative Writing Assistant