Learn Before
In the context of training an agent, if the ratio of the current policy's probability to a reference policy's probability for a specific action is 0.7, this indicates that the agent has been updated to favor this action more than it did previously.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI agent is being updated. From a particular state, the original 'reference' version of the agent had a 40% chance of selecting action 'X'. The new 'current' version of the agent, after some training, now has only a 10% chance of selecting action 'X' from that same state. Based on this information, what can be concluded about the ratio of the current policy's probability to the reference policy's probability for taking action 'X'?
In the context of training an agent, if the ratio of the current policy's probability to a reference policy's probability for a specific action is 0.7, this indicates that the agent has been updated to favor this action more than it did previously.
Self-Driving Car Policy Update Analysis