1Cademy - Policy

Learn Before

Math Behind Reinforcement Learning

Concept

Policy

A policy decides which action is selected by the agent at some state. Two criteria can be used to classify policies.

First, depending on whether policy is time-dependent or not, we can categorize them as either stationary or non-stationary. If the cumulative rewards the agent aims to optimize are limited to a finite number of future steps, we could use non-stationary policies. But in most cases, we will use stationary policies.

Second, policies can also be categorized under the criterion of being either deterministic or stochastic. In the deterministic case, what action will be taken at some certain state is determined. In the stochastic case, given the action and state, the policy will return the probability that action a may be chosen in state s.

0

1

Updated 2025-09-16

Contributors are:

Who are from:

University of Michigan - Ann Arbor

🏆 1

References

Reference for Deep Reinforcement Learning

Learn Before

Related

Learn After