Learn Before
Concept

Parameter Update at the Reference Policy Point in PPO

In Proximal Policy Optimization, the parameter update is analyzed on the optimization surface at the specific point where the current policy parameters θ are equal to the reference policy parameters θ_ref. This point serves as the baseline for the update, around which a local approximation of the objective function is constructed to guide the optimization step.

0

1

Updated 2026-01-15

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences