Learn Before
Formula

Reward Transformation Formula

This formula defines a transformed reward function, rr', based on an original reward function, rr. The new reward is calculated by adding an arbitrary function, ff, to the original reward. All functions are dependent on the current state (sts_t), action (ata_t), and the subsequent state (st+1s_{t+1}). The mathematical expression is: r(st,at,st+1)=r(st,at,st+1)+f(st,at,st+1)r'(s_t, a_t, s_{t+1}) = r(s_t, a_t, s_{t+1}) + f(s_t, a_t, s_{t+1}) This demonstrates how alternative reward functions can be generated, which is a core aspect of why reward models can be underdetermined.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences