1Cademy - Impact of Penalty Coefficient on LLM Fine-Tuning

Learn Before

Overall PPO Objective Function for Language Models

Case Study

Impact of Penalty Coefficient on LLM Fine-Tuning

Analyze the two scenarios described in the case study below. For each scenario, predict the most likely behavior of the fine-tuned language model and explain your reasoning by referring to the components of the combined objective function used in training.

Updated 2025-10-03

Contributors are: