1Cademy - Policy Update Analysis

Learn Before

A2C Actor Loss Function

Case Study

Policy Update Analysis

An agent is being trained using an actor-critic method. The actor's objective is to adjust its policy, π, to maximize expected rewards by minimizing the following loss function: L(θ) = -E[log π(a|s) * A(s,a)], where A(s,a) is the advantage of taking action a in state s. In a particular state, the critic calculates the advantages for three possible actions as shown in the case study. Based on this single-step observation, which action's probability will the policy be most strongly encouraged to increase? Justify your answer by explaining how the components of the loss function drive this update.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related