1Cademy - Analyzing the Trade-off in a Policy Optimization Objective

Learn Before

Composite Objective for PPO-Clip

Short Answer

Analyzing the Trade-off in a Policy Optimization Objective

Consider the following composite objective function used in a policy optimization algorithm: $U_{\text{composite}} = U_{\text{surrogate}} - \beta \cdot \text{Penalty}$ Explain the fundamental trade-off that the hyperparameter β is designed to manage during the training process.