1Cademy - An AI agent is being updated. From a particular state, the original reference version of the agent had a 40% chance of selecting action X. The new current version of the agent, after some training, now has only a 10% chance of selecting action X from that same state. Based on this information, what can be concluded about the ratio of the current policys probability to the reference policys probability for taking action X?

Learn Before

Policy Probability Ratio Less Than One

Multiple Choice

An AI agent is being updated. From a particular state, the original 'reference' version of the agent had a 40% chance of selecting action 'X'. The new 'current' version of the agent, after some training, now has only a 10% chance of selecting action 'X' from that same state. Based on this information, what can be concluded about the ratio of the current policy's probability to the reference policy's probability for taking action 'X'?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related