Learn Before
Definition

LLM Policy as a Probability Distribution

In the context of reinforcement learning, the policy of a Large Language Model agent is the model's probability distribution over possible outputs. This policy, often denoted by π\pi, is equivalent to the conditional probability of generating an output sequence 'y' given an input context 'x'. This relationship is expressed as π(yx)=Pr(yx)\pi(y|x) = Pr(y|x).

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related