Learn Before
A data science team is developing a predictive model. They start with a large, comprehensive dataset which they split into three separate, non-overlapping subsets. One subset is used to iteratively adjust the model's internal parameters to learn patterns. A second subset is used to periodically check the model's performance during development and to make decisions about its overall structure (e.g., its complexity). A third subset is kept completely separate and is only used once, at the very end, to get a final, unbiased measure of the model's real-world performance. Which of the following statements best distinguishes the role of the first subset from the other two?
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Training or Fitting a supervised statistical learning model
Resampling in supervised statistical learning
Evaluating a Data Strategy for Model Development
A data science team is developing a predictive model. They start with a large, comprehensive dataset which they split into three separate, non-overlapping subsets. One subset is used to iteratively adjust the model's internal parameters to learn patterns. A second subset is used to periodically check the model's performance during development and to make decisions about its overall structure (e.g., its complexity). A third subset is kept completely separate and is only used once, at the very end, to get a final, unbiased measure of the model's real-world performance. Which of the following statements best distinguishes the role of the first subset from the other two?
A financial services company aims to build a model that predicts whether a new loan applicant is likely to default. To create the model, the data science team uses a dataset consisting exclusively of applicants who were previously approved for loans and have a perfect repayment history. What is the most significant flaw in this approach regarding the data used to build the model?