Concept

Variance in Surrogate Objective Gradient Estimates

A primary challenge when utilizing an unclipped surrogate objective function for policy optimization is that the variance in its resulting gradient estimates is typically quite high. This elevated variance introduces significant noise into the parameter updates, which can destabilize the overall learning process and hinder reliable convergence.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences