Concept
Markov Decision Process
A Markov decision process is a tuple (S, A,P, γ, R), where: • S is a finite set of states. • A is a finite set of actions. • P is the state transition probability matrix, . • γ ∈ is called the discount factor. • R : S × A → R is a reward function.
The dynamic of Markov Decision Process proceeds as follows: We start in some state , and choose some actions to take. As a result of our choice, the state of the Markov Decision Process randomly transits to some successor state , drawn from . Then, from state , we pick another action . Again, we come to some state , and so on.
0
1
Updated 2020-10-17
Contributors are:
Who are from:
Tags
Data Science