Learn Before
Concept
Learning Rate Dilemma in Ill-conditioned Problems
When applying standard gradient descent to an ill-conditioned objective function, optimizing the learning rate creates a difficult dilemma. Because the gradient changes at drastically different rates across different dimensions, choosing a small learning rate prevents the solution from diverging in the steep directions (e.g., ) but results in extremely slow convergence in the flat directions (e.g., ). Conversely, choosing a large learning rate speeds up progress in the flat directions but causes the solution to diverge or oscillate wildly in the steep directions, significantly deteriorating the overall quality of the solution.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L