Learn Before
Example

Example of Misalignment in Instruction-Following

An example of misalignment occurs when an LLM, asked how to hack a computer, provides instructions for the illegal activity. Although this response technically follows the user's instruction, a properly aligned model would instead refuse the harmful request and explain the negative consequences. This scenario highlights the critical difference between simple instruction-following and genuine alignment with human values and safety principles.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models