Concept

Approximation of the Policy Divergence Penalty

In practical applications, the policy divergence penalty, which is based on the log-probability of a trajectory, can be simplified. This approximation involves calculating the penalty using only the policy probabilities while ignoring the influence of the environment's dynamics, leading to a more computationally tractable measure.

Image 0

0

1

Updated 2026-01-15

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences