Formula

Value-Based Reward Shaping Formula

This formula presents a specific application of potential-based reward shaping where the state-value function, V(s)V(s), is used as the potential function, Φ(s)\Phi(s). The transformed reward, rr', is calculated by augmenting the original environmental reward, rr, with a shaping term derived from the change in the discounted state value between the subsequent state, st+1s_{t+1}, and the current state, sts_t. The formula is expressed as:

r(st,at,st+1)=r(st,at,st+1)+γV(st+1)V(st)r'(s_t, a_t, s_{t+1}) = r(s_t, a_t, s_{t+1}) + \gamma V(s_{t+1}) - V(s_t)

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences