1Cademy - Calculating Trajectory Utility with Importance Sampling

Learn Before

Policy Gradient Objective with Importance Sampling

Case Study

Calculating Trajectory Utility with Importance Sampling

An agent's policy is being updated. The utility of a trajectory is calculated by re-weighting the advantage of each action. The formula for the utility U of a two-step trajectory is: $U = \left( \frac{\pi_{\theta}(a_1|s_1)}{\pi_{\theta_{\text{ref}}}(a_1|s_1)} \cdot A(s_1, a_1) \right) + \left( \frac{\pi_{\theta}(a_2|s_2)}{\pi_{\theta_{\text{ref}}}(a_2|s_2)} \cdot A(s_2, a_2) \right)$ Based on the data in the case study below, calculate the total utility U for this trajectory and explain what the final value implies for the policy update.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related