1Cademy - Proximal Policy Optimization (PPO)

Learn Before

Incorporating Policy Divergence Penalty into the Clipped Surrogate Objective

Concept

Proximal Policy Optimization (PPO)

Proximal Policy Optimization (PPO) is a highly popular reinforcement learning training method that is defined by its use of a composite objective function. This objective function combines a clipped surrogate objective with a policy divergence penalty. PPO has found widespread application not only in the training of Large Language Models (LLMs) but also in many other fields.

Updated 2026-05-02

Contributors are: