1Cademy - Analyzing Training Instability in a Network Sub-layer

Learn Before

Residual Connections and Layer Normalization in Transformers

Case Study

Analyzing Training Instability in a Network Sub-layer

Based on the described sub-layer architecture, explain why the placement of the normalization step might be contributing to the observed training instability. Specifically, how does the computational path affect the gradient flow back to the main branch of the residual connection?

Updated 2025-10-09

Contributors are: