Formula

Reward Shaping Formula

In reward shaping, a new reward signal, known as the transformed reward function r(st,at,st+1)r'(s_t, a_t, s_{t+1}), is created by adding a shaping reward function f(st,at,st+1)f(s_t, a_t, s_{t+1}) to the environment's original reward function r(st,at,st+1)r(s_t, a_t, s_{t+1}). This relationship is expressed by the formula: r(st,at,st+1)=r(st,at,st+1)+f(st,at,st+1)r'(s_t, a_t, s_{t+1}) = r(s_t, a_t, s_{t+1}) + f(s_t, a_t, s_{t+1}) This technique provides an agent with additional feedback, where all functions depend on the current state sts_t, action ata_t, and next state st+1s_{t+1}.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences