Concept
Policy Gradient Methods for Deep Reinforcement Learning
In policy gradients , we directly learn the policy function π which gives the probability distribution of actions. Probability of taking action a given state s with parameters .
Neural networks can be used to find the policy function. The network takes state as an input and produces the probability distribution of actions.
Process in simple terms:
- Takes in a state and gets the probability of each action
- Selects the most probable action and observe next state & reward
- Repeats until the end of the game and evaluates the total rewards
- Updates the parameters in the network, based on the rewards, using backpropagation
In this way, the network allows the agent to play freely, but with every successive game, it provides better probabilities for actions that will lead the agent to a positive result.

0
1
Updated 2025-09-22
Tags
Data Science