Multiple Choice

A reward model is being trained to prefer one machine-generated text response over another for a given input. The training process aims to minimize a loss function calculated as the negative logarithm of a sigmoid applied to the difference between the reward scores of the preferred (RprefR_{pref}) and non-preferred (RnonprefR_{non-pref}) responses. Given the following reward scores assigned by the model to a single pair of responses, which scenario contributes the least to the total loss, indicating the model is correctly differentiating between the responses?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science