Learn Before
Conditioning the Hessian
The ratio of the largest singular value to the smallest singular value when dealing with a symmetric positive (if all its eigenvalues are positive) matrix gives the condition number of the Hessian matrix. When the conditioning number of the matrix is very high, we say that the matrix is ill-conditioned. This is indicative that the matrix has correlated vectors along rows/columns and also to the fact that there is a large difference between the smallest and largest values. For neural nets, it means that loss function has almost parallel contours that are stretched out into very long ellipsoids rather than being more circular. As a result, the steps taken would be zigzag instead of being orthogonal to the contours. A point to take into consideration is that if the hessian is significantly ill-conditioned, then the second order approximation which requires computation of the inverse would be very unstable.
0
1
Tags
Data Science