1Cademy - Evaluating Data Sourcing for a Spam Filter

Learn Before

Data-Generating Process and Data-Generating Distribution (in Machine Learning)

Case Study

Evaluating Data Sourcing for a Spam Filter

A machine learning team is building a spam filter for a new global email service set to launch next month. They need to create training and test datasets to develop and validate their model. They have two options for sourcing their data. Evaluate the two options below and recommend which one is more likely to result in a model that performs well on real-world user emails after the service launches. Justify your recommendation based on the relationship between the sourced data and the data the model will encounter in production.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related