Formula

Advantage Function in Terms of Q-values and V-values

The advantage function, A(st,at)A(s_t, a_t), defines the benefit of selecting a particular action ata_t in a state sts_t relative to the expected value of following the policy from that state onward. It is calculated as the difference between the action-value function (QQ-value) for the specific state-action pair and the state-value function (VV-value) for that state. The formula is: A(st,at)=Q(st,at)V(st)A(s_t, a_t) = Q(s_t, a_t) - V(s_t) A positive advantage indicates that the action is better than the expected policy outcome, while a negative advantage suggests it is worse.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences