It's very important to understand how we define a basic reward function in reinforcement learning and its principia mathematica. The basic intuition of reward fucntion in reinforcement learning is the Bellman Equation, which describes the expected reward. And we want to maximize the expected reward. 
The Bellman Equation is:
$v(s) = E[R_{t+1}+\lambda v(S_{t+1})|S_t = s]$

University of Michigan - Ann Arbor

A reward is a feedback used to effectively evaluate the agent’s action, can be immediate or delayed. Depending on the reward, the event can be considered good or bad. In case agent receives a low reward after performing particular action which was guided by the policy, then the policy would be modified.  The purpose of the agent is to maximize the reward.
 

Reward in Reinforcement Learning

https://cims.nyu.edu/~donev/Teaching/WrittenOral/Projects/XintianHan-WrittenAndOral.pdf
https://www.davidsilver.uk/teaching/     
(David Silver's tutorial of reinforcement learning)
http://rail.eecs.berkeley.edu/deeprlcourse/
https://katefvision.github.io/
http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse2006.html
http://web.stanford.edu/class/cs234/schedule.html

Useful Tutorials about Math behind Reinforcement Learning

Rewards focus on the immediate context while value functions can focus on the long term. For instance, an action can have a low immediate reward while in long term can have a high value.

Reward vs. Value Function

Rewards are very important as they are used for estimating value. Estimating rewards are relatively easy compared to calculating values which can be quite challenging. The values are calculated after each time step and after each time step it is important to receive highest value and not the highest reward. 

"While return gives the expected discounted sum of rewards for one episode,
a value function gives the expected discounted sum of rewards from a certain state"

 Rewards, Returns and Value functions

For traditional Q-learning method, there might be no need to train a reward function. For these methods, a reward form is kept to record the reward for every (state, action) pair.  However, to deal with the curse of the dimensionality, a function is provided instead of the form. We try to use the targeted Q value used in the updating process of Q-learning as the tag to train a predict network for that. That's exactly why function approximation is needed. If there exists a well-formed policy to search through (action, state) space to get the best action or fairly small (action, state) space, there is no need to do a function approximation.

Learn Before

Related