Learn Before
Concept

Applicability of Second-Order Methods in Deep Learning

While second-order optimization algorithms, such as Newton's Method, offer the theoretical advantage of using curvature to determine step sizes, they are generally impractical for deep neural networks. The primary limitation is the prohibitive computational cost associated with the Hessian matrix, H\mathbf{H}. For a model with dd parameters, the Hessian requires storing O(d2)\mathcal{O}(d^2) entries, and computing it via backpropagation is excessively expensive, making the direct application of pure second-order methods infeasible for large-scale deep learning tasks.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L