Learn Before
Analyze the value-based reward shaping formula, r' = r + γV(s_{t+1}) - V(s_t), by matching each component to its specific role or definition within the general structure of potential-based reward shaping.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Advantage Function as a Form of Shaped Reward
Calculating a Shaped Reward
An agent is being trained using value-based reward shaping. In a particular transition from state
s_ttos_{t+1}, the agent receives an environmental rewardrof 0. The agent's current value function estimates that the value of the next state,V(s_{t+1}), is substantially higher than the value of the current state,V(s_t). Based on the formular' = r + γV(s_{t+1}) - V(s_t), what is the most likely consequence of this shaping on the agent's learning for this specific transition?Analyze the value-based reward shaping formula,
r' = r + γV(s_{t+1}) - V(s_t), by matching each component to its specific role or definition within the general structure of potential-based reward shaping.