1Cademy - Evaluating a Policy Change

Learn Before

Policy Probability Ratio (Ratio Function)

Case Study

Evaluating a Policy Change

Based on the scenario provided, calculate the ratio of the new policy's action probability to the old policy's action probability. Then, explain what this ratio implies about how the observed reward should be used to evaluate the new policy.

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Increased Action Probability Condition
Policy Probability Ratio Less Than One
Bound Function for Policy Probability Ratio
Policy Probability Ratio Greater Than One
Upper-Bound Clipping Function for Policy Ratios
Evaluating a Policy Change
In an off-policy reinforcement learning scenario, an agent is in a specific state. The policy that originally collected the training data (the reference policy) selected a particular action with a probability of 0.2. The agent's current, updated policy would select that same action with a probability of 0.8. What does the resulting probability ratio imply about how the reward for this action-state pair should be treated during the policy update?
Interpreting Policy Changes

Learn Before

Related