1Cademy - Reasoning for Objective Simplification

Learn Before

Surrogate Objective at the Policy Reference Point

Short Answer

Reasoning for Objective Simplification

In a policy optimization scenario, an objective function is defined as: $J(\theta) = \mathbb{E}_{\tau \sim \pi_{\theta_{\text{ref}}}} \left[ \frac{\text{Pr}_{\theta}(\tau)}{\text{Pr}_{\theta_{\text{ref}}}(\tau)} R(\tau) \right]$ Explain step-by-step why this objective function simplifies to $\mathbb{E}_{\tau \sim \pi_{\theta_{\text{ref}}}} [R(\tau)]$ when the policy parameters $\theta$ are set to be identical to the reference policy parameters $\theta_{\text{ref}}$ .

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related