Concept

Smoothing splines avoid overfitting

If we simply fit a smooth curve g(x)g(x) to a set of data by minimizing RSS = i=1n(yig(xi))2\sum^n_{i=1}(y_i - g(x_i))^2 without putting any constraints, there is a problem that we can always make RSS zero simply by choosing gg such that it interpolates all of the yiy_i, leading to the overfitting of the data. What we want is a function gg that makes RSS small, but that is also smooth. Instead, we can change the optimization formulation from RSS to the function below: i=1n(yig(xi))2+λg(t)2dt\sum^n_{i=1}(y_i - g(x_i))^2 + \lambda \int g''(t)^2 dt, where λ is a nonnegative tuning parameter. The function gg that minimizes the formulation above is known as a smoothing spline.

0

6

Updated 2020-02-24

Tags

Data Science

Learn After