1Cademy - A research team is training a language model to act as a helpful assistant using methods from reinforcement learning. One researcher is focused on analyzing the models policy (π) for generating a response given a users query. Another researcher is analyzing the models conditional probability distribution (Pr) over all possible responses for the same query. What is the relationship between the policy and the conditional probability distribution in this context?

Learn Before

LLM Policy as a Probability Distribution

Multiple Choice

A research team is training a language model to act as a helpful assistant using methods from reinforcement learning. One researcher is focused on analyzing the model's 'policy' (π) for generating a response given a user's query. Another researcher is analyzing the model's 'conditional probability distribution' (Pr) over all possible responses for the same query. What is the relationship between the 'policy' and the 'conditional probability distribution' in this context?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related