Definition

Score Function in Policy Gradients

The score function in policy gradient methods is the gradient of the log-probability of a trajectory τ\tau with respect to the policy parameters θ\theta. It is expressed as θlogPrθ(τ)\nabla_{\theta} \log \text{Pr}_{\theta}(\tau) or, using partial derivative notation, logPrθ(τ)θ\frac{\partial \log \text{Pr}_{\theta}(\tau)}{\partial \theta}. This term is fundamental to deriving the policy gradient objective, as it indicates the direction in the parameter space that increases the probability of a given trajectory.

Image 0

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences