Learn Before
Concept

Alternative to the Hessian (Krylov Methods)

Sometimes, higher order derivatives are needed for our models to learn. If we needed the second order derivatives, we could use the Hessian matrix. However, there are often millions or even billions of parameters in our models, so the Hessian is extremely difficult to calculate and represent.

For some function f:RnRf:\mathbb{R}^n\rightarrow \mathbb{R} with a Hessian H\bold{H}, and an arbitrary vector vv:

Hv=x[(xf(x))v]\bold{H}v=\nabla_{\bold{x}}[(\nabla_{\bold{x}}f(x))^{\top}v]

0

1

Updated 2021-06-11

References


Tags

Data Science