Concept

Benefits of Valued-based Methods

Valued-based methods aim to find value functions. The advantage of learning the value function is that we can now select actions without a model of the Markov Decision Process. E.g., in Q learning, the optimal policy is given by

π(s)=argmaxaAQ^(s,a)\pi^{\star}(s)=\underset{a \in \mathcal{A}}{\operatorname{argmax}} \hat{Q}^{\star}(s, a)

0

1

Updated 2020-10-17

Tags

Data Science