1Cademy - Score Function in Policy Gradients

Learn Before

Derivation of the Policy Gradient Objective Function

Definition

Score Function in Policy Gradients

The score function in policy gradient methods is the gradient of the log-probability of a trajectory $\tau$ with respect to the policy parameters $\theta$ . It is expressed as $\nabla_{\theta} \log \text{Pr}_{\theta}(\tau)$ or, using partial derivative notation, $\frac{\partial \log \text{Pr}_{\theta}(\tau)}{\partial \theta}$ . This term is fundamental to deriving the policy gradient objective, as it indicates the direction in the parameter space that increases the probability of a given trajectory.