1Cademy - Markov Process

Learn Before

Math Behind Reinforcement Learning

Concept

Markov Process

A Markov Process is a tuple (S,P), where • S is a (finite) set of states • P is a state transition probability matrix. $P_{ss'} = P[S_{t+1}=s'|S_t = s]$ .

We make some constrains on states. A sequence of states is Markov if and only if the probability of moving to the next state $S_{t+1}$ depends only on the present state $S_t$ and not on the previous states $S_1, S_2, · · · , S_{t−1}$ . That is, for all t, $P[S_{t+1}|S_t] = P[S_{t+1}|S_1, S_2, · · · , S_t].$ In reinforcement learning, Markov Process is time-homogeneous. That is, the probability of the transition is independent of t: $P[S_{t+1} = s'|S_t = s] = P[S_t = s'|S_{t−1} = s]$ .