Learn Before
Gradient Descent on a Nonconvex Function with Local Minima
For nonconvex objective functions, gradient descent can converge to a local minimum rather than the global minimum, and the particular local minimum reached depends on both the learning rate and the problem's conditioning. As an illustration, consider the function for a constant , which possesses infinitely many local minima due to its oscillatory structure. When gradient descent is applied with an unrealistically high learning rate, the algorithm takes large steps that skip over better-quality minima and settles into a poor local minimum. This demonstrates that the learning rate not only affects convergence speed but also influences which solution gradient descent ultimately finds on nonconvex landscapes.
0
1
Tags
D2L
Dive into Deep Learning @ D2L