Learn Before
In a policy optimization method, a composite objective function is used, defined as Objective = Clipped_Surrogate_Objective - β * Divergence_Penalty. This function balances maximizing the primary objective with a penalty for how much the policy changes. What is the most likely consequence of setting the hyperparameter β to a very high value?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a policy optimization method, a composite objective function is used, defined as
Objective = Clipped_Surrogate_Objective - β * Divergence_Penalty. This function balances maximizing the primary objective with a penalty for how much the policy changes. What is the most likely consequence of setting the hyperparameterβto a very high value?Stabilizing Policy Optimization Training
Analyzing the Trade-off in a Policy Optimization Objective