An AI development team aims to build a helpful and harmless chatbot. Their strategy involves creating a large dataset where human experts label thousands of potential chatbot responses to various prompts as either "aligned" or "not aligned." The team then trains the model to generate responses that match the "aligned" labels. Which statement best analyzes the fundamental weakness of relying solely on this data-fitting method for alignment?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI development team aims to build a helpful and harmless chatbot. Their strategy involves creating a large dataset where human experts label thousands of potential chatbot responses to various prompts as either "aligned" or "not aligned." The team then trains the model to generate responses that match the "aligned" labels. Which statement best analyzes the fundamental weakness of relying solely on this data-fitting method for alignment?
Critique of an AI Alignment Strategy
True or False: If an AI development team could create a massive, perfectly labeled dataset covering a vast range of human interactions, training a large language model to perfectly replicate the 'good' labels in this dataset would be sufficient to ensure the model is fully aligned with human values.