1Cademy - Rationale for Using a Surrogate Objective

Learn Before

Surrogate Objective in Reinforcement Learning

Short Answer

Rationale for Using a Surrogate Objective

In the context of policy optimization, an agent's performance is ultimately measured by the on-policy objective, $\mathbb{E}_{\tau \sim \pi_{\theta}} [R(\tau)]$ . However, many algorithms instead optimize a surrogate objective, such as $\mathbb{E}_{\tau \sim \pi_{\theta_{\text{ref}}}} \left[ \frac{\text{Pr}_{\theta}(\tau)}{\text{Pr}_{\theta_{\text{ref}}}(\tau)} R(\tau) \right]$ , where data is sampled from a reference policy $\pi_{\theta_{\text{ref}}}$ . Analyze the primary practical advantage of optimizing this surrogate objective compared to directly optimizing the on-policy objective.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related