Concept

Preference Dataset Sampling Operation

When computing the loss for a reward model, Dr\mathcal{D}_r represents a set of tuples containing an input and a pair of outputs. The expression (x,yk1,yk2)Dr(\mathbf{x},\mathbf{y}_{k_1},\mathbf{y}_{k_2}) \sim \mathcal{D}_r designates a sampling operation that draws a specific tuple from Dr\mathcal{D}_r according to a given probability. As an example of this sampling, a model input x\mathbf{x} could first be drawn using a uniform distribution, followed by drawing a pair of outputs based on the conditional probability that yk1\mathbf{y}_{k_1} is preferred over yk2\mathbf{y}_{k_2} given x\mathbf{x}. This probability is denoted mathematically as Pr(yk1yk2x)\Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x}).

0

1

Updated 2026-04-20

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences