1Cademy - A reward model is being trained with a loss function that includes a regularization component. This component adds a penalty proportional to $(r(\mathbf{x}, \mathbf{y}_a) + r(\mathbf{x}, \mathbf{y}_b))^2$ for a given input $\mathbf{x}$ and a pair of responses $(\mathbf{y}_a, \mathbf{y}_b)$. The goal of this penalty is to prevent reward scores from becoming excessively large. Consider two scenarios for the reward scores assigned to a pair of responses: - Scenario 1: $r(\mathbf{x}, \mathbf{y}_a) = 10$ and $r(\mathbf{x}, \mathbf{y}_b) = -10$ - Scenario 2: $r(\mathbf{x}, \mathbf{y}_a) = 5$ and $r(\mathbf{x}, \mathbf{y}_b) = 5$ Based on the formula for the penalty, which of the following statements correctly analyzes the effect of the regularization in these two scenarios?

Learn Before

Regularized Pairwise Loss Function for Reward Model Training

Multiple Choice

A reward model is being trained with a loss function that includes a regularization component. This component adds a penalty proportional to $(r(\mathbf{x}, \mathbf{y}_a) + r(\mathbf{x}, \mathbf{y}_b))^2$ for a given input $\mathbf{x}$ and a pair of responses $(\mathbf{y}_a, \mathbf{y}_b)$ . The goal of this penalty is to prevent reward scores from becoming excessively large. Consider two scenarios for the reward scores assigned to a pair of responses:

Scenario 1: $r(\mathbf{x}, \mathbf{y}_a) = 10$ and $r(\mathbf{x}, \mathbf{y}_b) = -10$
Scenario 2: $r(\mathbf{x}, \mathbf{y}_a) = 5$ and $r(\mathbf{x}, \mathbf{y}_b) = 5$

Based on the formula for the penalty, which of the following statements correctly analyzes the effect of the regularization in these two scenarios?

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related