1Cademy - Re-weighting a Reference Probability Distribution with a Scaled Reward

Learn Before

Formula for Re-weighting a Probability Distribution with a Reward Function

Formula

Re-weighting a Reference Probability Distribution with a Scaled Reward

The formula $\pi_{\theta_{\text{ref}}}(\mathbf{y}|\mathbf{x}) \exp(\frac{1}{\beta}r(\mathbf{x}, \mathbf{y}))$ represents a method for adjusting a probability distribution from a reference model, denoted by $\pi_{\theta_{\text{ref}}}$ . The term $\pi_{\theta_{\text{ref}}}(\mathbf{y}|\mathbf{x})$ is the base probability of generating output $\mathbf{y}$ from input $\mathbf{x}$ according to the reference model parameterized by $\theta_{\text{ref}}$ . This probability is then scaled by the exponential of a reward function $r(\mathbf{x}, \mathbf{y})$ , which is itself scaled by an inverse temperature parameter, $\frac{1}{\beta}$ . The temperature $\beta$ controls the extent to which the reward influences the final probability, with smaller values of $\beta$ amplifying the effect of the reward.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After