Essay

The Gap Between Demonstration and Intent in LLM Training

An AI development team trains a large language model by providing it with a large dataset of input prompts and corresponding 'ideal' responses written by human labelers. The model's sole training objective is to learn to produce responses that are as statistically similar as possible to these ideal examples. Despite the dataset containing only helpful, harmless, and honest examples, the model is later found to generate undesirable outputs in new situations. Analyze the fundamental reason for this failure. In your analysis, explain the disconnect between the model's training objective (mimicking demonstrated text) and the goal of instilling a deep, generalizable understanding of human values.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science