1Cademy - In a system that collects human feedback by presenting two model-generated responses for comparison, if a human evaluator strongly prefers response A over response B, this preference is encoded with a higher numerical value than if they only slightly preferred response A.

Learn Before

Binary Encoding of Pairwise Feedback in RLHF

True/False

In a system that collects human feedback by presenting two model-generated responses for comparison, if a human evaluator strongly prefers response A over response B, this preference is encoded with a higher numerical value than if they only slightly preferred response A.

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences