1Cademy - Role of the Advantage Function in Policy Updates

Learn Before

Policy Gradient with Advantage Function Formula

Short Answer

Role of the Advantage Function in Policy Updates

Consider the following formula used to update an agent's policy parameters, θ, based on a collection of experiences D:

$\frac{\partial J(\theta)}{\partial \theta} \approx \frac{1}{|\mathcal{D}|} \sum_{\tau \in \mathcal{D}} \frac{\partial}{\partial \theta} \left( \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) A(s_t, a_t) \right)$

In the context of this update rule, briefly explain the primary role of the term A(s_t, a_t) and how its sign (positive or negative) influences the adjustment of the policy πθ.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences