1Cademy - Clipped Surrogate Objective Function

Learn Before

Surrogate Objective in Reinforcement Learning

Formula

Clipped Surrogate Objective Function

To address the high variance and resultant instability in policy gradient estimates, a clipped surrogate objective function is widely used. This objective incorporates a clipping mechanism to bound the importance weights, ensuring that individual policy updates do not become excessively large. The clipped utility function is formally defined as: $U_{\mathrm{clip}}(\tau;\theta) = \sum_{t=1}^{T} \mathrm{Clip}\Big( \frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{\mathrm{ref}}}(a_t|s_t)} \Big) A(s_t,a_t)$ where the clipping operation restricts the probability ratio using a specified boundary hyperparameter $\epsilon$ : $\mathrm{Clip}\Big( \frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{\mathrm{ref}}}(a_t|s_t)} \Big) = \min\Big( \frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{\mathrm{ref}}}(a_t|s_t)},\mathrm{bound} \big(\frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{\mathrm{ref}}}(a_t|s_t)}, 1 - \epsilon, 1 + \epsilon \big) \Big)$ .

0

1

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related