1Cademy - Learning Rate and Training Time Trade-off in LLMs

Learn Before

Training Instability in Large-Scale LLMs

Concept

Learning Rate and Training Time Trade-off in LLMs

Achieving stable training for large models with gradient descent often requires selecting a small learning rate. This choice introduces a critical trade-off: while a smaller learning rate helps prevent training instability, it significantly increases the overall training time.

Updated 2026-04-21

Contributors are: