Short Answer

Analysis of the Policy Regularization Penalty

A penalty term is used during the training of a language model to prevent its behavior from drifting too far from a stable, reference version of the model. This penalty is calculated as the difference between the log-probability of a generated response under the current policy and the log-probability of the same response under the reference policy. If the current policy and the reference policy assign the exact same probability to a given response, what will the numerical value of the penalty be? Explain what this value signifies about the current policy's deviation from the reference policy for that specific response.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related