Learn Before
Concept

Why does regularization prevent overfitting?

While adding an extra term penalizes the weight matrix from becoming too large, why does this prevent overfitting? This is because if lambda becomes very large the weight of the matrices is incentivized to be close to 0. In other words, you are zeroing the impact of hidden units so that a simpler network is the result. The hidden units are still there, but now they have a much smaller effect. With a small W and and small Z, the activation function will likely be fairly linear.

Image 0

0

3

Updated 2021-05-28

Tags

Data Science