1Cademy - A reinforcement learning agent is being updated. The current policy is denoted by $\pi_{\theta}$, and a batch of trajectory data has been collected using a previous, fixed policy, $\pi_{\theta_{\text{ref}}}$. To improve the current policy using this existing data, the following objective function is optimized: $L(\theta) = \mathbb{E}_{\tau \sim \pi_{\theta_{\text{ref}}}} \left[ \frac{\text{Pr}_{\theta}(\tau)}{\text{Pr}_{\theta_{\text{ref}}}(\tau)} R(\tau) \right]$. Which statement best analyzes the role of this objective function in the training process?

Learn Before

Surrogate Objective in Reinforcement Learning

Multiple Choice

A reinforcement learning agent is being updated. The current policy is denoted by $\pi_{\theta}$ , and a batch of trajectory data has been collected using a previous, fixed policy, $\pi_{\theta_{\text{ref}}}$ . To improve the current policy using this existing data, the following objective function is optimized: $L(\theta) = \mathbb{E}_{\tau \sim \pi_{\theta_{\text{ref}}}} \left[ \frac{\text{Pr}_{\theta}(\tau)}{\text{Pr}_{\theta_{\text{ref}}}(\tau)} R(\tau) \right]$ . Which statement best analyzes the role of this objective function in the training process?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related