Learn Before
Evaluating the 'Alignment' Framing of Self-Refinement
A prominent viewpoint in AI development is that improving a large language model's self-refinement capabilities is fundamentally an alignment problem. Critically evaluate this viewpoint. In your answer, argue for why this framing is useful and also discuss a potential scenario where unguided self-refinement could lead to a misaligned outcome, despite the model becoming more effective at its self-defined task.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Activating Self-Correction via RLHF
A research team is developing a large language model to provide helpful and safe responses. They implement an iterative process where the model first generates a response, then critiques its own response against a set of principles (e.g., 'is the response factually accurate?', 'is it free of harmful bias?'), and finally, revises the response based on the critique. How does viewing this self-improvement process as an 'alignment problem' provide the most accurate analysis of the team's goal?
Analyzing Misaligned Self-Refinement
Connecting Self-Refinement and Alignment
Evaluating the 'Alignment' Framing of Self-Refinement