1Cademy - Policy Gradient Methods for Deep Reinforcement Learning

Learn Before

Policy-Based Methods for Deep Reinforcement Learning

Concept

Policy Gradient Methods for Deep Reinforcement Learning

In policy gradient methods, we directly learn the policy function $\pi$ , which outputs a probability distribution over actions. The term $\pi(s,a;\theta) \in [0,1]$ represents the probability of taking action $a$ given state $s$ with parameters $\theta$ . Neural networks can be used to find the policy function, taking the state as input and producing the probability distribution of actions.

The general process is:

The agent takes in a state and computes the probability of each action.
It samples an action based on this probability distribution and observes the next state and reward.
This cycle repeats until the end of the episode (game) and the total reward is evaluated.
The parameters $\theta$ in the network are updated using backpropagation and gradient ascent based on the rewards.

Through this process, the network allows the agent to play and explore, gradually increasing the probabilities of actions that lead to positive returns.

0

1

Updated 2026-06-13

Contributors are:

Who are from:

University of Michigan - Ann Arbor

References

Learn Before

Related

Learn After