Learn Before
Challenges in Defining Human Preferences for LLM Alignment
A fundamental challenge in aligning Large Language Models is that humans often have difficulty precisely articulating their own preferences and values upfront. In many cases, it is hard to accurately describe what is desired until we actually observe the model's responses to user requests. This ambiguity complicates the process of creating comprehensive guidelines and training datasets.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
A research lab has developed a large language model that is highly capable of generating human-like text. However, during testing, they find it frequently produces outputs that are unhelpful, factually inaccurate, or contrary to basic ethical principles. To address this, the lab initiates a new phase of training that specifically uses human preferences and feedback to steer the model's outputs towards being more helpful, honest, and harmless. What is the primary goal of this new training phase?
Classification of Instruction Fine-Tuning as an Alignment Problem
Evaluating Model Training Objectives
Example of Misalignment in Instruction-Following
Challenges in Defining Human Preferences for LLM Alignment
Analysis of LLM Alignment
Learn After
Analyzing Ambiguous AI Training Objectives
A research team is trying to train a language model to generate 'engaging and creative' stories. They hire a large group of people to rate thousands of stories on a scale of 1 to 5 for both 'engagement' and 'creativity'. Despite collecting a massive dataset, they find that the model trained on these ratings often produces stories that are formulaic or uninspired. Which of the following statements best analyzes the most fundamental reason for this failure?
Evaluating an AI Content Generation Project