Formula for Re-weighting a Probability Distribution with a Reward Function
The expression defines a method for adjusting a base probability distribution, , using a reward function, . The term represents the initial probability of an output given an input . This probability is then multiplied by , the exponential of the reward associated with that specific input-output pair. This re-weighting mechanism increases the likelihood of outputs with higher rewards.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A language model is generating text and has so far produced the sequence 'The sky is'. The model now needs to calculate the likelihood of the next word being 'blue'. Which of the following mathematical expressions correctly represents the probability of the next word being 'blue', given the preceding words?
Conditional Probability in Sequence-to-Sequence Generation
Notation for Machine Translation Probability
Formula for Re-weighting a Probability Distribution with a Reward Function
Applying Conditional Probability Notation in Text Summarization
Learn After
Re-weighting a Reference Probability Distribution with a Scaled Reward
A language model is generating a completion for an input
x. The model has a base probability distribution,π(y|x), for four potential completions (y). To steer the model's output, a reward function,r(x, y), is applied to create a new unnormalized score for each completion using the formula:Score(y) = π(y|x) * exp(r(x, y)). Given the values below, which completion will have the highest score?When using the formula
Score(y) = π(y|x) * exp(r(x, y))to adjust the likelihood of a potential outputy, setting the rewardr(x, y)to zero will cause the final score for that output to become zero, effectively eliminating it from consideration.Steering Language Model Output for Slogan Generation