1Cademy - Formula for Re-weighting a Probability Distribution with a Reward Function

Learn Before

General Notation for Conditional Probability in Sequence Generation

Formula

Formula for Re-weighting a Probability Distribution with a Reward Function

The expression $π(y|x) \exp(r(x, y))$ defines a method for adjusting a base probability distribution, $π(y|x)$ , using a reward function, $r(x, y)$ . The term $π(y|x)$ represents the initial probability of an output $y$ given an input $x$ . This probability is then multiplied by $\exp(r(x, y))$ , the exponential of the reward associated with that specific input-output pair. This re-weighting mechanism increases the likelihood of outputs with higher rewards.