Learn Before
Analyzing the Trade-off in a Policy Optimization Objective
Consider the following composite objective function used in a policy optimization algorithm: Explain the fundamental trade-off that the hyperparameter β is designed to manage during the training process.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a policy optimization method, a composite objective function is used, defined as
Objective = Clipped_Surrogate_Objective - β * Divergence_Penalty. This function balances maximizing the primary objective with a penalty for how much the policy changes. What is the most likely consequence of setting the hyperparameterβto a very high value?Stabilizing Policy Optimization Training
Analyzing the Trade-off in a Policy Optimization Objective