1Cademy - Ridge Regression

Learn Before

Weight Decay

Concept

Ridge Regression

In Ridge regression the coefficients and bias are learned using the same least-square criterion, but it adds a penalty for large variations in coefficients; i.e., coefficients are found by minimizing a tuning parameter - which controls the strength of the penalty term. Once the parameters are learned, the ridge regression prediction formula is the same as OLS. Ridge regression uses L2 regularization that minimizes the sum of square of coefficients and the influence of the regularization term is controlled by the $\alpha$ parameter. Higher $\alpha$ means more regularization and simpler models. Use Ridge regression when the number of predictor variables is greater than the number of observations. Below is the formula found in our textbook.