Learn Before
Analyzing Language Model Response Probabilities
Considering the relationship where a model's generation probability for a response is proportional to the exponential of its reward score, analyze the provided case study. Compare the likely difference in generation probabilities between Response A and Response B, and contrast this with the difference between Response B and Response C. What does this reveal about how this formulation helps the model distinguish between high-quality and low-quality responses?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Plackett-Luce Selection Probability Formula
Optimal Policy as a Product of Reference Policy and Exponentiated Reward
Worth Function in Plackett-Luce Model
A language model's policy, which determines the probability of generating an output
ygiven an inputx, is structured to be proportional to the exponential of a reward scorer(x, y). For a specific input, two potential outputs have the following reward scores:- Output A: Reward = 3.0
- Output B: Reward = 1.0
Based on this formulation, how does the probability of generating Output A compare to the probability of generating Output B?
Analyzing Language Model Response Probabilities
A language model's policy is designed such that the probability of generating an output is proportional to the exponential of its reward score. If Output Y has a reward score that is exactly double the reward score of Output Z, it means the policy will assign exactly double the probability to Output Y compared to Output Z.