Training Error and Test Error
Typically, when training a machine learning model, we have access to a training set; we can compute some error measure on the training set, called the training error; and we reduce this training error.
What separates machine learning from optimization is that we want the generalization error, also called the test error, to be low as well. The test error is defined as the expected value of the error on a new input. Here the expectation is taken across different possible inputs, drawn from the distribution of inputs we expect the system to encounter in practice.
We typically estimate the test error of a machine learning model by measuring its performance on a test set of examples that were collected separately from the training set.
0
1
Tags
Data Science
Related
Training Error and Test Error
Data Sampling Notation from a Distribution
Conditional Probability of Pairwise Preference
A team develops a model to predict customer churn using historical data from 2019-2021. The model performs exceptionally well on a portion of this historical data set aside for testing. However, when deployed to predict churn for customers in 2023, its performance is poor. A major new loyalty program was introduced at the beginning of 2023, altering customer retention patterns. Which of the following statements best analyzes the most likely reason for this discrepancy?
A data scientist is tasked with building a model to predict real estate prices for an entire metropolitan area. To do this, they must create a training set and a test set. Which of the following data collection and splitting strategies presents the most significant risk of violating the fundamental assumption that both datasets are drawn from the same underlying probability distribution?
Evaluating Data Sourcing for a Spam Filter
Overfitting a supervised statistical model
Training Error and Test Error
Generalizability of a supervised statistical model
Underfitting a supervised statistical model
Measuring Model Complexity: Rademacher complexity
Bias of Supervised Models in Statistical Learning
Variance of Supervised Models in Statistical Learning
Falsifiability of Machine Learning Models
Notions of Model Complexity
Relationship Between Dataset Size and Model Complexity