Learn Before
Concept

Ridge Regression

In Ridge regression the coefficients and bias are learned using the same least-square criterion, but it adds a penalty for large variations in coefficients; i.e., coefficients are found by minimizing a tuning parameter - which controls the strength of the penalty term. Once the parameters are learned, the ridge regression prediction formula is the same as OLS. Ridge regression uses L2 regularization that minimizes the sum of square of coefficients and the influence of the regularization term is controlled by the α\alpha parameter. Higher α\alpha means more regularization and simpler models. Use Ridge regression when the number of predictor variables is greater than the number of observations. Below is the formula found in our textbook.

i=1n(yiβ0j=1pβjxij)2+λj=1pβj2=RSS+λj=1pβj2.\sum_{i=1}^{n} ({y}_{i}-{\beta}_{0}-\sum_{j=1}^{p}{\beta}_{j}{x}_{ij})^2 +\lambda\sum_{j=1}^{p}{\beta}_{j}^2=RSS+\lambda\sum_{j=1}^{p}{\beta}_{j}^2.

Note: Ridge Regression is sensitive to scales of variable. Therefore, we usually standardize the predictors before applying Ridge Regression.

0

2

Updated 2026-05-03

Tags

Data Science

D2L

Dive into Deep Learning @ D2L