Learn Before
Resampling in supervised statistical learning
To learn more about a data set and fitted model we can use resampling methods to repeatedly draw samples from the training set to build new models in order to estimate the fitted models uncertainty (can yeild estimates for standard errors and confidence limits).
1
6
Contributors are:
Who are from:
Tags
Data Science
Related
Training or Fitting a supervised statistical learning model
Resampling in supervised statistical learning
Evaluating a Data Strategy for Model Development
A data science team is developing a predictive model. They start with a large, comprehensive dataset which they split into three separate, non-overlapping subsets. One subset is used to iteratively adjust the model's internal parameters to learn patterns. A second subset is used to periodically check the model's performance during development and to make decisions about its overall structure (e.g., its complexity). A third subset is kept completely separate and is only used once, at the very end, to get a final, unbiased measure of the model's real-world performance. Which of the following statements best distinguishes the role of the first subset from the other two?
A financial services company aims to build a model that predicts whether a new loan applicant is likely to default. To create the model, the data science team uses a dataset consisting exclusively of applicants who were previously approved for loans and have a perfect repayment history. What is the most significant flaw in this approach regarding the data used to build the model?