1Cademy - Variance in Surrogate Objective Gradient Estimates

Learn Before

Surrogate Objective in Reinforcement Learning

Concept

Variance in Surrogate Objective Gradient Estimates

A primary challenge when utilizing an unclipped surrogate objective function for policy optimization is that the variance in its resulting gradient estimates is typically quite high. This elevated variance introduces significant noise into the parameter updates, which can destabilize the overall learning process and hinder reliable convergence.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related