1Cademy - A reward model is being trained to prefer one machine-generated text response over another for a given input. The training process aims to minimize a loss function calculated as the negative logarithm of a sigmoid applied to the difference between the reward scores of the preferred ($R_{pref}$) and non-preferred ($R_{non-pref}$) responses. Given the following reward scores assigned by the model to a single pair of responses, which scenario contributes the *least* to the total loss, indicating the model is correctly differentiating between the responses?

Learn Before

Pair-wise Ranking Loss Formula for RLHF Reward Model

Multiple Choice

A reward model is being trained to prefer one machine-generated text response over another for a given input. The training process aims to minimize a loss function calculated as the negative logarithm of a sigmoid applied to the difference between the reward scores of the preferred ( $R_{pref}$ ) and non-preferred ( $R_{non-pref}$ ) responses. Given the following reward scores assigned by the model to a single pair of responses, which scenario contributes the least to the total loss, indicating the model is correctly differentiating between the responses?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related