Concept

Penalty-Based Trust Region Implementation

A common method for implementing a trust region is to modify the objective function by adding a penalty term. This approach constrains the size of the policy update by penalizing significant deviations from a reference policy. The penalty is calculated using a divergence measure that quantifies the difference between the current policy and the reference, thereby discouraging updates that would move the policy outside of the trusted area.

Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences