Learn Before
Concept

Advantage Actor-Critic (A2C) Method

The Advantage Actor-Critic (A2C) method is a reinforcement learning algorithm that optimizes a policy through two interacting components. The actor aims at learning a policy by updating its parameters using a policy gradient objective, which incorporates the advantage function A(st,at)A(s_t, a_t) to focus more on actions likely to improve performance. The critic, on the other hand, acts as an evaluator; it updates its estimation of the state-value function V(st)V(s_t), which is subsequently used to calculate the advantage function.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences