Learn Before
Short Answer

Analyzing the Impact of a State-Value Baseline

In a policy gradient algorithm, the state-value function is often used as a baseline to reduce the variance of gradient estimates. Explain how subtracting the expected future reward from the current state (the state-value) helps to differentiate between a 'good' action taken in a 'bad' state versus a 'mediocre' action taken in a 'good' state. How does this differentiation lead to a more stable learning signal?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science