1Cademy - On-Policy Objective Function (Performance Measure)

Learn Before

Objective Function as Expected Cumulative Reward (Performance Function)

Formula

On-Policy Objective Function (Performance Measure)

The performance of a policy $\pi_\theta$ in reinforcement learning is measured by an objective function, $J(\theta)$ . This function is defined as the expected cumulative reward, $R(\tau)$ , over the distribution of trajectories $\tau$ generated by following the policy. The goal of the agent is to find the policy parameters $\theta$ that maximize this value. The formula is: $J(\theta) = \mathbb{E}_{\tau \sim \pi_\theta} [R(\tau)]$