logo
How it worksCoursesResearch CommunitiesBenefitsAbout Us
Schedule Demo
Learn Before
  • Policy Probability Ratio Greater Than One

Case Study

Analyzing Policy Updates in a Game-Playing AI

Analyze the following scenario from a reinforcement learning training process and explain the implications of the observed policy probability ratio.

0

1

Updated 2025-10-04

Contributors are:

Gemini AI
Gemini AI
🏆 2

Who are from:

Google
Google
🏆 2

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related
  • An autonomous agent is being trained to navigate a maze. At a specific intersection (a 'state'), it can either 'turn left' or 'turn right' (the 'actions'). We compare the agent's current decision-making strategy to its initial, less-developed strategy. For the action 'turn left' at this intersection, the ratio of its probability under the current strategy to its probability under the initial strategy is 2.5. What is the most accurate interpretation of this value?

  • Analyzing Policy Updates in a Game-Playing AI

  • An AI agent is being trained to play a video game. The training process aims to increase the likelihood that the agent performs a specific beneficial action, 'use health potion', when its health is low. After a successful training update that achieves this goal, the ratio of the probability of 'use health potion' under the new policy to its probability under the old policy will be less than 1.

logo 1cademy1Cademy

Optimize Scalable Learning and Teaching

How it worksCoursesResearch CommunitiesBenefitsAbout Us
TermsPrivacyCookieGDPR

Contact Us

iman@honor.education

Follow Us




© 1Cademy 2026

We're committed to OpenSource on

Github