Concept

High Variance in Policy Gradient Estimates

A significant drawback of the policy gradient approach is the high variance associated with its gradient estimates. This variability introduces noise into the learning process, which can render it unstable and inefficient.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences