1Cademy - An AI development team aims to build a helpful and harmless chatbot. Their strategy involves creating a large dataset where human experts label thousands of potential chatbot responses to various prompts as either aligned or not aligned. The team then trains the model to generate responses that match the aligned labels. Which statement best analyzes the fundamental weakness of relying solely on this data-fitting method for alignment?

Learn Before

Insufficiency of Data Fitting for Value Alignment

Multiple Choice

An AI development team aims to build a helpful and harmless chatbot. Their strategy involves creating a large dataset where human experts label thousands of potential chatbot responses to various prompts as either "aligned" or "not aligned." The team then trains the model to generate responses that match the "aligned" labels. Which statement best analyzes the fundamental weakness of relying solely on this data-fitting method for alignment?

Updated 2025-10-01

Contributors are:

Who are from:

Learn Before

Related