1Cademy - A reinforcement learning agent is being trained using a utility function that incorporates an upper-bound clip on the policy probability ratio, defined as `min(ratio, 1+ε)`, where `ε` is a small positive constant. Consider two distinct actions taken during an episode: * **Action A:** Has a large positive advantage, and its probability ratio is `2.0`. * **Action B:** Has a large negative advantage, and its probability ratio is `0.1`. Assuming `ε = 0.2`, how does this specific clipping mechanism influence the policy update derived from these two actions?

Learn Before

Clipped Utility Function with Upper-Bound Clipping

Multiple Choice

A reinforcement learning agent is being trained using a utility function that incorporates an upper-bound clip on the policy probability ratio, defined as min(ratio, 1+ε), where ε is a small positive constant. Consider two distinct actions taken during an episode:

Action A: Has a large positive advantage, and its probability ratio is 2.0.
Action B: Has a large negative advantage, and its probability ratio is 0.1.

Assuming ε = 0.2, how does this specific clipping mechanism influence the policy update derived from these two actions?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related