Multiple Choice

A researcher is training a reward model using a small preference dataset, Dr\mathcal{D}_r, which contains exactly two preference pairs:

  1. For input x1\mathbf{x}_1, response y1a\mathbf{y}_{1a} is preferred over y1b\mathbf{y}_{1b}.
  2. For input x2\mathbf{x}_2, response y2a\mathbf{y}_{2a} is preferred over y2b\mathbf{y}_{2b}.

Given the empirical loss formula Lr(ϕ)=1Dr(x,ya,yb)DrlogPrϕ(yaybx)\mathcal{L}_r(\phi) = -\frac{1}{|\mathcal{D}_r|} \sum_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\in\mathcal{D}_r} \log \text{Pr}_{\phi}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x}), which of the following expressions correctly represents the loss for this specific dataset?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science