Concept

Regularization Constant

The trade-off between the standard prediction loss and the additive weight decay penalty is characterized by the regularization constant, λ\lambda. This nonnegative hyperparameter is fit using validation data and modifies the objective to L(w,b)+λ2w2L(\mathbf{w}, b) + \frac{\lambda}{2} \|\mathbf{w}\|^2. When λ=0\lambda = 0, the original loss function is recovered. For λ>0\lambda > 0, the size of the weights is restricted, with larger values of λ\lambda constraining the weights more considerably. The penalty term is divided by 22 by convention so that the constant cancels out gracefully when the derivative of the quadratic function is taken.

0

1

Updated 2026-05-03

Tags

Data Science

D2L

Dive into Deep Learning @ D2L