1Cademy - A research team is training an agent and has a policy represented by parameters θ_current. To evaluate the performance of this policy using its on-policy objective function, J(θ_current), the team can use a large, pre-existing dataset of trajectories that were collected while the agent was operating under a slightly older set of parameters, θ

Learn Before

On-Policy Objective Function (Performance Measure)

True/False

A research team is training an agent and has a policy represented by parameters θ_current. To evaluate the performance of this policy using its on-policy objective function, J(θ_current), the team can use a large, pre-existing dataset of trajectories that were collected while the agent was operating under a slightly older set of parameters, θ_previous.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related