Learn Before
Calculating a Shaped Reward
A reinforcement learning agent is navigating a maze. It takes an action from its current state, s_t, which leads it to a new state, s_{t+1}. The agent's goal is to learn a good path by adjusting its behavior based on the rewards it receives. To help guide the agent more effectively, the standard environmental reward is transformed using the agent's own value estimates for the states.
Given the information from this specific state transition below, calculate the new, transformed reward (r') using the provided formula.
Formula:
r'(s_t, a_t, s_{t+1}) = r(s_t, a_t, s_{t+1}) + γV(s_{t+1}) - V(s_t)
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Advantage Function as a Form of Shaped Reward
Calculating a Shaped Reward
An agent is being trained using value-based reward shaping. In a particular transition from state
s_ttos_{t+1}, the agent receives an environmental rewardrof 0. The agent's current value function estimates that the value of the next state,V(s_{t+1}), is substantially higher than the value of the current state,V(s_t). Based on the formular' = r + γV(s_{t+1}) - V(s_t), what is the most likely consequence of this shaping on the agent's learning for this specific transition?Analyze the value-based reward shaping formula,
r' = r + γV(s_{t+1}) - V(s_t), by matching each component to its specific role or definition within the general structure of potential-based reward shaping.