1Cademy - Analyzing Language Model Response Probabilities

Learn Before

Policy Proportional to Exponentiated Reward

Case Study

Analyzing Language Model Response Probabilities

Considering the relationship where a model's generation probability for a response is proportional to the exponential of its reward score, analyze the provided case study. Compare the likely difference in generation probabilities between Response A and Response B, and contrast this with the difference between Response B and Response C. What does this reveal about how this formulation helps the model distinguish between high-quality and low-quality responses?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related