1Cademy - An agent is operating under a policy parameterized by $\theta$. This policy can result in one of two possible trajectories. Trajectory A has a total reward of 20 and a 70% probability of occurring. Trajectory B has a total reward of -10 and a 30% probability of occurring. Given that the performance of a policy is measured by the expected cumulative reward over all possible trajectories ($J(\theta) = \sum_{\tau} \text{Pr}_{\theta}(\tau)R(\tau)$), what is the value of the performance function $J(\theta)$ for this policy?

Learn Before

Objective Function as Expected Cumulative Reward (Performance Function)

Multiple Choice

An agent is operating under a policy parameterized by $\theta$ . This policy can result in one of two possible trajectories. Trajectory A has a total reward of 20 and a 70% probability of occurring. Trajectory B has a total reward of -10 and a 30% probability of occurring. Given that the performance of a policy is measured by the expected cumulative reward over all possible trajectories ( $J(\theta) = \sum_{\tau} \text{Pr}_{\theta}(\tau)R(\tau)$ ), what is the value of the performance function $J(\theta)$ for this policy?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related