Case Study

Selecting a Single Evaluation Metric for a Spam Classifier

Case context: You are developing a spam email classifier where both the precision (avoiding marking real emails as spam) and recall (catching as many spam emails as possible) are critical to the project's success. Currently, your team is struggling to compare models because some have high precision but low recall, while others have low precision but high recall.

Question: Based on ML Yearning, what evaluation strategy should you propose to resolve this difficulty, and how could you compute a concrete metric for it using a standard method?

Sample answer: You should propose combining precision and recall into a single evaluation number. A standard method to compute this single metric is to take the average of precision and recall for each model.

Key points:

  • Propose combining precision and recall into a single evaluation metric.
  • Diagnose the difficulty of comparing multiple models using two separate metrics.
  • Suggest taking the average of precision and recall as a standard computation method.

Rubric: The answer must recommend combining precision and recall into a single number to resolve the model comparison issue, and suggest taking the average as the standard method for computation.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI

Related