Short Answer

Asymmetric Effect of Upper-Bound Clipping

In a policy gradient algorithm, a clipping function is defined as min(ratio, 1 + ε), where ratio is the probability ratio of the new policy to the old policy, and ε is a small positive hyperparameter. Consider two scenarios for a given state-action pair:

  1. The advantage is positive (+3.0) and the ratio is 1.8.
  2. The advantage is negative (-3.0) and the ratio is 1.8.

Assuming ε = 0.2, explain how the clipping function affects the magnitude of the policy update in each scenario and analyze the reasoning behind this differential treatment.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science