Learn Before
Policy
A policy decides which action is selected by the agent at some state. Two criteria can be used to classify policies.
First, depending on whether policy is time-dependent or not, we can categorize them as either stationary or non-stationary. If the cumulative rewards the agent aims to optimize are limited to a finite number of future steps, we could use non-stationary policies. But in most cases, we will use stationary policies.
Second, policies can also be categorized under the criterion of being either deterministic or stochastic. In the deterministic case, what action will be taken at some certain state is determined. In the stochastic case, given the action and state, the policy will return the probability that action a may be chosen in state s.
0
1
Tags
Data Science