Formula

Advantage Function Formula

The Advantage Function, A(st,at)A(s_t, a_t), measures the relative benefit of taking a specific action ata_t in a state sts_t compared to the expected value from that state onward. It is calculated by subtracting the state-value function, V(st)V(s_t), which acts as a baseline (bb), from the sum of future rewards. The formula is: A(st,at)=k=tTrkV(st)A(s_t, a_t) = \sum_{k=t}^{T} r_k - V(s_t)

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences