Short Answer

Invariance of Preference Probability

A system models the probability that a response y_a is preferred over y_b for a given input x using the formula: P(y_a > y_b | x) = Sigmoid(r(x, y_a) - r(x, y_b)), where r(x, y) is a learned quality score. If a new reward function r'(x, y) is created by adding the same constant value C to all original scores (i.e., r'(x, y) = r(x, y) + C), how does this change affect the calculated preference probabilities? Explain your reasoning.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science