Formula

Formula for Re-weighting a Probability Distribution with a Reward Function

The expression π(yx)exp(r(x,y))π(y|x) \exp(r(x, y)) defines a method for adjusting a base probability distribution, π(yx)π(y|x), using a reward function, r(x,y)r(x, y). The term π(yx)π(y|x) represents the initial probability of an output yy given an input xx. This probability is then multiplied by exp(r(x,y))\exp(r(x, y)), the exponential of the reward associated with that specific input-output pair. This re-weighting mechanism increases the likelihood of outputs with higher rewards.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences