Learn Before
Formula

Reward Function in Reinforcement Learning

The reward function formally describes the feedback an agent receives from the environment, often denoted as RR. Specifically, r(s,a,s)r(s, a, s') represents the reward for taking action aa in state ss and transitioning to the next state ss'. For a sequence of state-action pairs, the reward at a specific time step tt is written as rt=r(st,at,st+1)r_t = r(s_t, a_t, s_{t+1}). In deterministic decision-making processes, where the next state st+1s_{t+1} is entirely determined by the current state sts_t and action ata_t, the notation simplifies to r(st,at)r(s_t, a_t).

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences