Learn Before
A research team is developing a large language model to provide helpful and safe responses. They implement an iterative process where the model first generates a response, then critiques its own response against a set of principles (e.g., 'is the response factually accurate?', 'is it free of harmful bias?'), and finally, revises the response based on the critique. How does viewing this self-improvement process as an 'alignment problem' provide the most accurate analysis of the team's goal?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Activating Self-Correction via RLHF
A research team is developing a large language model to provide helpful and safe responses. They implement an iterative process where the model first generates a response, then critiques its own response against a set of principles (e.g., 'is the response factually accurate?', 'is it free of harmful bias?'), and finally, revises the response based on the critique. How does viewing this self-improvement process as an 'alignment problem' provide the most accurate analysis of the team's goal?
Analyzing Misaligned Self-Refinement
Connecting Self-Refinement and Alignment
Evaluating the 'Alignment' Framing of Self-Refinement