Learn Before
Simplifying Advantage Function Calculation in A2C
At first glance, the Advantage Actor-Critic (A2C) model may seem challenging to develop because the advantage function appears to require two separate sub-models for the action-value function and the state-value function . However, by expressing the -value as the immediate return plus the value of the next state, , the equation can be rewritten as . Introducing the discount factor generalizes this to the temporal difference (TD) error: . This means A2C only needs to train a single critic network for the value function to compute the advantage.
0
1
Tags
Foundations of Large Language Models
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences