Concept

Learning Rate Dilemma in Ill-conditioned Problems

When applying standard gradient descent to an ill-conditioned objective function, optimizing the learning rate creates a difficult dilemma. Because the gradient changes at drastically different rates across different dimensions, choosing a small learning rate prevents the solution from diverging in the steep directions (e.g., x2x_2) but results in extremely slow convergence in the flat directions (e.g., x1x_1). Conversely, choosing a large learning rate speeds up progress in the flat directions but causes the solution to diverge or oscillate wildly in the steep directions, significantly deteriorating the overall quality of the solution.

Image 0

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L