Regularization Constant
The trade-off between the standard prediction loss and the additive weight decay penalty is characterized by the regularization constant, . This nonnegative hyperparameter is fit using validation data and modifies the objective to . When , the original loss function is recovered. For , the size of the weights is restricted, with larger values of constraining the weights more considerably. The penalty term is divided by by convention so that the constant cancels out gracefully when the derivative of the quadratic function is taken.
0
1
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
Depth and Width for Neural Networks
Dropout
Neural Network Learning Rate
Epochs in Machine Learning
Activation Functions in Neural Networks
Deep Learning Optimizer Algorithms
Batch Normalization in Deep Learning
Deep Learning Weight Initialization
Hyperparameters Tuning Methods in Deep Learning
Difference between Model Parameter and Model Hyperparameter
Regularization Constant
Frobenius and L2
Ridge Regression
What is weight decay?
Gaussian (Normal) Distribution
Regularization Constant
Lasso Regression
Laplace Distribution
Regularization Constant