1Cademy - In the context of updating a policy using an objective function with importance sampling, if the ratio of the current policys probability to the reference policys probability for a given action is greater than 1, this will always increase the likelihood of that action being selected in the subsequent policy update.

Learn Before

Policy Gradient Objective with Importance Sampling

True/False

In the context of updating a policy using an objective function with importance sampling, if the ratio of the current policy's probability to the reference policy's probability for a given action is greater than 1, this will always increase the likelihood of that action being selected in the subsequent policy update.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related