1Cademy - Markov Decision Process

Learn Before

Concept

Markov Decision Process

A Markov decision process is a tuple (S, A,P, γ, R), where: • S is a finite set of states. • A is a finite set of actions. • P is the state transition probability matrix, $P^a_{ss'} = P[S_{t+1} = s'|S_t = s, A_t = a]$ . • γ ∈ $[0, 1]$ is called the discount factor. • R : S × A → R is a reward function.

The dynamic of Markov Decision Process proceeds as follows: We start in some state $s_0$ , and choose some actions $a_0 ∈ A$ to take. As a result of our choice, the state of the Markov Decision Process randomly transits to some successor state $s_1$ , drawn from $P^{a_0}_{s_0s_1}$ . Then, from state $s_1$ , we pick another action $a_1$ . Again, we come to some state $s_2$ , and so on.