Concept

Deep Q-learning

One value function we frequently used in value-based methods is the action-value function Q(s,a), which represents the total value of taking action a in state s. It is the sum of the future rewards r, adjusted by a discount factor gamma. Q(s,a)=maxπE[rt+γrt+1+γ2rt+2+st=s,at=a,π]Q^{*}(s, a)=\max _{\pi} \mathbb{E}\left[r_{t}+\gamma r_{t+1}+\gamma^{2} r_{t+2}+\ldots \mid s_{t}=s, a_{t}=a, \pi\right]

The basic steps of deep Q-learning algorithms:

  1. Train convolutional neural network to extract the essential features that can help the agent make the decision.
  2. Calculate the Q-Value of each possible action.
  3. Perform back-propagation to find the most accurate Q-Values.
Image 0

0

1

Updated 2020-10-22

Tags

Data Science

Learn After