Formula

Policy Notation for Autoregressive Models (π_θ)

The notation πθ(ytX,y<t)π_\theta(y_t | \mathbf{X}, \mathbf{y}_{<t}) is often used to represent the policy of an autoregressive model. It denotes the conditional probability of selecting output yty_t at time step tt, given a context X\mathbf{X} and the sequence of previously generated outputs y<t\mathbf{y}_{<t}. This policy is governed by the model's parameters θ\theta. This notation is functionally equivalent to the standard probability notation Prθ(ytX,y<t)Pr_\theta(y_t | \mathbf{X}, \mathbf{y}_{<t}).

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences