1Cademy - An AI development team trains a large language model to be helpful and harmless. They create a massive dataset containing millions of examples of harmful user prompts, each paired with a safe, refusal-to-answer response (e.g., I cannot fulfill this request.). After training, they find the model still generates subtly harmful or biased content in response to novel, cleverly phrased prompts that were not in the training data. Which of the following statements best analyzes the fundamental reason

Learn Before

Insufficiency of Data Fitting for Aligning with Human Values

Multiple Choice

An AI development team trains a large language model to be helpful and harmless. They create a massive dataset containing millions of examples of harmful user prompts, each paired with a safe, refusal-to-answer response (e.g., "I cannot fulfill this request."). After training, they find the model still generates subtly harmful or biased content in response to novel, cleverly phrased prompts that were not in the training data. Which of the following statements best analyzes the fundamental reason

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related