1Cademy - Objective Function as Expected Cumulative Reward (Performance Function)

Learn Before

Formula

Objective Function as Expected Cumulative Reward (Performance Function)

In reinforcement learning, the objective function $J(\theta)$ , also known as the performance function, evaluates the effectiveness of a policy $\pi_\theta$ parameterized by $\theta$ . It is defined as the expected cumulative reward over all possible trajectories $\tau$ . The formula is commonly expressed as: $J(\theta) = \mathbb{E}_{\tau \sim \pi_{\theta}}[R(\tau)]$ The notation $\tau \sim \pi_{\theta}$ signifies that the trajectory $\tau$ is generated by following the policy $\pi_{\theta}$ . Alternatively, this objective can be written as a sum over the space of all trajectories $\mathcal{D}$ , weighted by the probability of each trajectory under the policy: $J(\theta) = \sum_{\tau \in \mathcal{D}} \mathrm{Pr}_{\theta}(\tau)R(\tau)$ Here, $R(\tau) = \sum_{t=1}^{T} r_t$ is the cumulative reward for a trajectory.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After