1Cademy - Analyzing a Flawed Policy Gradient Derivation

Learn Before

Derivation of the Policy Gradient Objective Function

Short Answer

Analyzing a Flawed Policy Gradient Derivation

A student is attempting to derive the policy gradient objective function. Their derivation is shown below. Identify the specific mathematical error in their steps and explain why this error introduces a fundamental problem that the standard derivation avoids.

Derivation Steps:

Objective Function: $J(\theta) = \sum_{\tau} \text{Pr}_{\theta}(\tau)R(\tau)$
Gradient Calculation: $\frac{\partial J(\theta)}{\partial \theta} = \sum_{\tau} \left[ \frac{\partial \text{Pr}_{\theta}(\tau)}{\partial \theta} R(\tau) + \text{Pr}_{\theta}(\tau) \frac{\partial R(\tau)}{\partial \theta} \right]$
Conclusion: The student stops here, concluding that for this gradient to be useful, the reward function $R(\tau)$ must be differentiable with respect to the policy parameters $\theta$ .

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related