Critique of an LLM Alignment Strategy
Based on the principles of AI alignment, critically evaluate the fundamental weakness of the company's strategy described in the case study. Explain why this approach, even with a very large dataset, is likely insufficient to ensure the AI consistently aligns with human preferences, especially in novel scenarios.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI development team trains a large language model to be helpful and harmless. They create a massive dataset containing millions of examples of harmful user prompts, each paired with a safe, refusal-to-answer response (e.g., "I cannot fulfill this request."). After training, they find the model still generates subtly harmful or biased content in response to novel, cleverly phrased prompts that were not in the training data. Which of the following statements best analyzes the fundamental reason for the model's failure?
Critique of an LLM Alignment Strategy
Critique of a Data-Centric Alignment Strategy