Formula

Advantage Function Definition

The advantage function, A(st,at)A(s_t, a_t), quantifies the relative benefit of taking a specific action ata_t compared to the expected value of following the policy from state sts_t onward. It is formally defined as the difference between the action-value function, Q(st,at)Q(s_t, a_t), and the state-value function, V(st)V(s_t): A(st,at)=Q(st,at)V(st)A(s_t, a_t) = Q(s_t, a_t) - V(s_t) A positive advantage value suggests the action is better than the expected policy value, while a negative value suggests it is worse. This measure is crucial in methods like A2C as it helps focus policy updates on actions likely to improve performance.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related