1Cademy - Analyze the value-based reward shaping formula, `r = r + γV(s_{t+1}) - V(s_t)`, by matching each component to its specific role or definition within the general structure of potential-based reward shaping.

Learn Before

Value-Based Reward Shaping Formula

Matching

Analyze the value-based reward shaping formula, r' = r + γV(s_{t+1}) - V(s_t), by matching each component to its specific role or definition within the general structure of potential-based reward shaping.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Calculating a Shaped Reward
An agent is being trained using value-based reward shaping. In a particular transition from state s_t to s_{t+1}, the agent receives an environmental reward r of 0. The agent's current value function estimates that the value of the next state, V(s_{t+1}), is substantially higher than the value of the current state, V(s_t). Based on the formula r' = r + γV(s_{t+1}) - V(s_t), what is the most likely consequence of this shaping on the agent's learning for this specific transition?
Analyze the value-based reward shaping formula, r' = r + γV(s_{t+1}) - V(s_t), by matching each component to its specific role or definition within the general structure of potential-based reward shaping.
Advantage Function as a Form of Shaped Reward

Learn Before

Related