Learn Before
Definition

Definition of the Advantage Function

The advantage function, A(st,at)A(s_t, a_t), measures the relative benefit of taking a specific action ata_t in a state sts_t compared to the expected value of following the policy from that state onward. It is formally defined as the difference between the action-value function (QQ-value) and the state-value function (VV-value): A(st,at)=Q(st,at)V(st)A(s_t, a_t) = Q(s_t, a_t) - V(s_t) This formulation is central to methods like the Advantage Actor-Critic (A2C) algorithm, where it helps focus the policy gradient updates on actions that are likely to improve performance.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences