1Cademy - Impact of Data Distribution on Reward Model Training

Learn Before

Empirical Reward Model Loss Formula

Short Answer

Impact of Data Distribution on Reward Model Training

A team is training a reward model using a dataset of 10,000 preference pairs. They notice that 2,000 of these pairs are for the single prompt, 'Write a story about a robot,' while the remaining 8,000 pairs are distributed across 4,000 other unique prompts. Given the standard empirical loss formula used for this training:

$\mathcal{L}_r(\phi) = -\frac{1}{|\mathcal{D}_r|} \sum_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\in\mathcal{D}_r} \log \text{Pr}_{\phi}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x})$

Analyze the most likely consequence of this data distribution on the trained reward model's behavior, and explain how the structure of the formula leads to this outcome.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related