Short Answer

Consequences of Modifying the PPO Objective Function

A researcher is training a language model using an objective function that combines a clipped surrogate objective (which encourages high rewards) with a policy divergence penalty, controlled by a coefficient β. If the researcher decides to set β to zero for the entire training process, what are the likely consequences for the model's generated text? Describe both a potential short-term benefit and a significant long-term drawback.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science