Score Function in Policy Gradients
The score function in policy gradient methods is the gradient of the log-probability of a trajectory with respect to the policy parameters . It is expressed as or, using partial derivative notation, . This term is fundamental to deriving the policy gradient objective, as it indicates the direction in the parameter space that increases the probability of a given trajectory.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Policy Gradient Theorem
Advantage of Policy Gradients: Non-Differentiable Reward Functions
Decomposition of the Trajectory Log-Probability Gradient
Policy Gradient Objective with Advantage Function
Policy Gradient Estimate under Uniform Trajectory Probability
Score Function in Policy Gradients
During the derivation of the policy performance gradient, a key step transforms the expression
Σ [∂Pr_θ(τ)/∂θ] R(τ)into a form that includes the term∂log Pr_θ(τ)/∂θ. What is the primary analytical purpose of this transformation?The following equations represent key steps in deriving the policy gradient. Arrange them in the correct logical order, starting from the initial gradient of the objective function to its final form as an expectation. Note: J(θ) is the objective function, Pr_θ(τ) is the probability of a trajectory τ under policy parameters θ, and R(τ) is the reward for that trajectory.
Analyzing a Flawed Policy Gradient Derivation
Learn After
In a policy gradient method, an agent executes a specific trajectory τ. The score function, defined as the gradient of the log-probability of this trajectory with respect to the policy parameters (∇θ log Prθ(τ)), is calculated. What is the fundamental interpretation of this score function vector?
Applying the Score Function in Policy Updates
Analyzing Policy Updates with the Score Function