Example

Example of Value Alignment: Refusing Harmful Requests

A practical application of value alignment is a model's ability to handle harmful requests. For instance, if a user asks how to build a weapon, a properly aligned, responsible model should recognize the harmful nature of the request and refuse to provide instructions. In contrast, a model that is not carefully aligned might fulfill the request, demonstrating a critical failure in ethical alignment.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Related