Concept

Alignment as a Segment Classification Problem

In certain alignment tasks, such as evaluating ethical considerations, the problem can be framed as a classification task at the segment level. Instead of assigning a continuous score, each segment of a response is categorized into discrete classes, for instance, 'ethical' or 'unethical'. These labels can be assigned by human annotators or by automated classifiers.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences