Concept

Markov Decision Process

A Markov decision process is a tuple (S, A,P, γ, R), where: • S is a finite set of states. • A is a finite set of actions. • P is the state transition probability matrix, Pssa=P[St+1=sSt=s,At=a]P^a_{ss'} = P[S_{t+1} = s'|S_t = s, A_t = a]. • γ ∈ [0,1][0, 1] is called the discount factor. • R : S × A → R is a reward function.

The dynamic of Markov Decision Process proceeds as follows: We start in some state s0s_0, and choose some actions a0Aa_0 ∈ A to take. As a result of our choice, the state of the Markov Decision Process randomly transits to some successor state s1s_1, drawn from Ps0s1a0P^{a_0}_{s_0s_1}. Then, from state s1s_1, we pick another action a1a_1. Again, we come to some state s2s_2, and so on.

0

1

Updated 2020-10-17

Tags

Data Science