1Cademy - Consider a reinforcement learning agent being trained. For a specific state-action pair, the ratio of the actions probability under the newly updated policy to its probability under the original reference policy is calculated to be 0.75. This result signifies that the training update has made the agent more likely to select this action in the future.

Learn Before

Increased Action Probability Condition

True/False

Consider a reinforcement learning agent being trained. For a specific state-action pair, the ratio of the action's probability under the newly updated policy to its probability under the original reference policy is calculated to be 0.75. This result signifies that the training update has made the agent more likely to select this action in the future.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related