Case Study

Evaluating Data Sourcing for a Spam Filter

A machine learning team is building a spam filter for a new global email service set to launch next month. They need to create training and test datasets to develop and validate their model. They have two options for sourcing their data. Evaluate the two options below and recommend which one is more likely to result in a model that performs well on real-world user emails after the service launches. Justify your recommendation based on the relationship between the sourced data and the data the model will encounter in production.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science