Learn Before
DQN
DQN is a milestone work for reinforcement learning. It introduces an end-2-end deep neural network to approximate the value fucntion. As a result, it solves the problem that traditional Q form can't deal with the high dimensional or continious input problem. There are two key tricks here, experience reply and fixed Q-target. Experience reply breaks samples' dependency and makes more samples available. Fixed Q-target plays a key role in helping rl model reach its convergence.
0
3
Tags
Data Science
Related
Pros and Cons of Actor-Critic Method
DQN
DDPG
Role of the Critic in Advantage Function Calculation
Robotic Chef Learning Paradigm
An autonomous agent is at a specific position in a grid world and must choose one of four directions to move (up, down, left, right). A purely value-based agent would estimate the long-term value of moving in each of the four directions and deterministically choose the direction with the highest estimated value. How does the decision-making process of an agent using an actor-critic method fundamentally differ in this same situation?
Definition of the Advantage Function
Training of Reward Models
In a reinforcement learning framework that separates the decision-making process from the evaluation process, there are two key components. Match each component to its primary function and the nature of its output.
Advantage Actor-Critic (A2C) Method