Learn Before
Differing Motivations of Instruction and Human Preference Alignment
Instruction alignment and human preference alignment are driven by distinct goals. The primary motivation for instruction alignment is to make a model generate outputs that adhere closely to explicit user commands, whereas human preference alignment is motivated by the need to train a model based on broader, often implicit, human feedback.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Surrogate Objectives in AI Alignment
Combined Use of Instruction and Human Preference Alignment
Differing Motivations of Instruction and Human Preference Alignment
Instruction Alignment
Human Preference Alignment via Reward Models
A development team is working to improve a large language model's behavior. They create two distinct datasets:
- Dataset 1: A curated set of prompts, each paired with a single, ideal, human-written response that demonstrates how to follow the prompt's instructions correctly.
- Dataset 2: A set of prompts where, for each prompt, a human evaluator has ranked several different model-generated responses from best to worst.
Which statement best analyzes the relationship between these datasets and the two fundamental approaches to model alignment?
Match each fundamental model alignment approach with its primary goal and typical implementation method.
Prioritizing Chatbot Alignment Strategies
Learn After
A research team is refining a language model using two distinct methods. In Method A, they train the model on a large dataset of specific commands paired with ideal, human-written responses that perfectly execute those commands (e.g., Command: 'List three benefits of solar power.' Ideal Response: A list of exactly three benefits). In Method B, they show human raters two different model-generated responses to the same open-ended prompt (e.g., 'Write a short, encouraging note') and ask the raters to choose which response they prefer. The model is then updated based on these preferences. What fundamental difference in goals do these two methods represent?
Diagnosing Chatbot Performance Issues
A development team is working on two separate improvement goals for their language model. Match each goal with the alignment methodology it primarily represents.