1Cademy - Consider an off-policy evaluation scenario where the performance of a target policy is estimated using data collected from a reference policy. If the target policy is identical to the reference policy, the importance sampling weight used to adjust the reward of every possible trajectory will be exactly 1.

Learn Before

Off-Policy Objective Function with Importance Sampling

True/False

Consider an off-policy evaluation scenario where the performance of a 'target' policy is estimated using data collected from a 'reference' policy. If the target policy is identical to the reference policy, the importance sampling weight used to adjust the reward of every possible trajectory will be exactly 1.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related