1Cademy - Diagnosing Training Instability

Learn Before

Baseline's Role in Centering Rewards and Reducing Gradient Variance

Case Study

Diagnosing Training Instability

Given the following scenario, explain why the chosen baseline is suboptimal and is likely contributing to training instability. Then, propose a more effective constant value for the baseline and justify why it would lead to more stable gradient updates.

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences