Multiple Choice

A human labeler is tasked with providing feedback on two different AI-generated summaries of an article, labeled Summary A and Summary B. After reviewing both, the labeler selects Summary B as the better one. In a typical system that uses pairwise comparisons to gather human feedback, how is this single preference decision mathematically encoded for the training process?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science