1Cademy - A language models policy, which determines the probability of generating an output `y` given an input `x`, is structured to be proportional to the exponential of a reward score `r(x, y)`. For a specific input, two potential outputs have the following reward scores: - Output A: Reward = 3.0 - Output B: Reward = 1.0 Based on this formulation, how does the probability of generating Output A compare to the probability of generating Output B?

Learn Before

Policy Proportional to Exponentiated Reward

Multiple Choice

A language model's policy, which determines the probability of generating an output y given an input x, is structured to be proportional to the exponential of a reward score r(x, y). For a specific input, two potential outputs have the following reward scores:

Output A: Reward = 3.0
Output B: Reward = 1.0

Based on this formulation, how does the probability of generating Output A compare to the probability of generating Output B?

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course