1Cademy - Surrogate Objective in Reinforcement Learning

Learn Before

Off-Policy Objective Function with Importance Sampling

Definition

Surrogate Objective in Reinforcement Learning

In reinforcement learning, a surrogate objective is an alternative objective function that is optimized in place of the true performance function. A common example is the off-policy objective, which uses importance sampling to evaluate the current policy with data from a reference policy: $\mathbb{E}_{\tau \sim \pi_{\theta_{\text{ref}}}} \left[ \frac{\text{Pr}_{\theta}(\tau)}{\text{Pr}_{\theta_{\text{ref}}}(\tau)} R(\tau) \right]$ . This formulation acts as a proxy, or surrogate, for the true on-policy objective $\mathbb{E}_{\tau \sim \pi_{\theta}} [R(\tau)]$ .