Formula

Cumulative Reward of a Trajectory

The cumulative reward, also referred to as the return, for a specific state-action sequence (or trajectory) is determined by adding together all the individual rewards accumulated during that sequence. If we denote a trajectory consisting of TT time steps as τ={(s1,a1),...,(sT,aT)}\tau = \{(s_1,a_1),...,(s_T,a_T)\}, the total cumulative reward R(τ)R(\tau) is formally calculated as the sum of the rewards rtr_t from the first step to the final step:

R(τ)=t=1TrtR(\tau) = \sum_{t=1}^{T} r_t

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences