1Cademy - Policy in the Context of LLMs

Learn Before

Policy
LLM as the Agent in RLHF

Definition

Policy in the Context of LLMs

For a Large Language Model (LLM), the policy, denoted as $\pi$ , represents the probability distribution over the vocabulary of possible next tokens, conditioned on the preceding sequence of tokens which constitute the context. In essence, it is the strategy the LLM uses to decide which token to generate next.