Concept

Unbiased Nature of Policy Gradient with Baseline

A crucial property of using a baseline in policy gradient methods is that it does not introduce any bias into the gradient estimate. While the baseline reduces the variance of the gradient, its expected value remains unchanged. This ensures that, on average, the policy updates still move in the correct direction to improve the policy.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences