1Cademy - In the context of training an agent, if the ratio of the current policys probability to a reference policys probability for a specific action is 0.7, this indicates that the agent has been updated to favor this action more than it did previously.

Learn Before

Policy Probability Ratio Less Than One

True/False

In the context of training an agent, if the ratio of the current policy's probability to a reference policy's probability for a specific action is 0.7, this indicates that the agent has been updated to favor this action more than it did previously.

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Comprehension in Revised Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

An AI agent is being updated. From a particular state, the original 'reference' version of the agent had a 40% chance of selecting action 'X'. The new 'current' version of the agent, after some training, now has only a 10% chance of selecting action 'X' from that same state. Based on this information, what can be concluded about the ratio of the current policy's probability to the reference policy's probability for taking action 'X'?
In the context of training an agent, if the ratio of the current policy's probability to a reference policy's probability for a specific action is 0.7, this indicates that the agent has been updated to favor this action more than it did previously.
Self-Driving Car Policy Update Analysis

Learn Before

Related