Case Study

Leveraging known spammers to build a training dataset using a honeypot.

Case context: You are designing an email filtering service and need to collect a huge training set of spam emails. You have access to a list of known spammer networks, but you currently lack labeled spam data to train your machine learning model.

Question: How can you implement a honeypot strategy to automatically build your spam dataset from these known spammers?

Sample answer: You can implement a honeypot by creating a set of fake email addresses and deliberately sending or exposing these addresses to the known spam networks. Since these addresses have no legitimate users, any emails sent to them can be automatically harvested and confidently labeled as spam, providing a large and clean training dataset.

Key points:

  • Create fake email addresses.
  • Deliberately send these addresses to known spammers.
  • Automatically harvest the incoming emails.
  • Confidently label the harvested emails as spam training data.

Rubric: The response must detail the creation of fake email addresses, their deliberate distribution to known spammers, and the automatic collection and labeling of the resulting incoming emails as spam.

0

1

Updated 2026-06-13

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Machine Learning Strategy

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Yearning @ DeepLearning.AI