Multiple Choice

An AI development team aims to build a helpful and harmless chatbot. Their strategy involves creating a large dataset where human experts label thousands of potential chatbot responses to various prompts as either "aligned" or "not aligned." The team then trains the model to generate responses that match the "aligned" labels. Which statement best analyzes the fundamental weakness of relying solely on this data-fitting method for alignment?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science