Learn Before
Definition

Policy in the Context of LLMs

For a Large Language Model (LLM), the policy, denoted as π\pi, represents the probability distribution over the vocabulary of possible next tokens, conditioned on the preceding sequence of tokens which constitute the context. In essence, it is the strategy the LLM uses to decide which token to generate next.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models