Learn Before
Case Study

Evaluating the suitability of F1 score for a highly imbalanced dataset.

Case context: You are developing a document classification system with highly unbalanced classes. While reviewing the performance metrics, a colleague suggests relying solely on the F1 score because it neatly combines precision and recall. However, another engineer points out that correctly identifying true negatives is a critical requirement for this specific document pipeline.

Question: Given the specific characteristics of the F1 score, diagnose the potential risks of using it as your sole metric in this highly unbalanced document classification scenario. What properties of the F1 score cause this issue?

Sample answer: Relying solely on the F1 score in this scenario presents a significant risk because the F1 score does not take true negatives into account. In a highly unbalanced dataset where the correct identification of true negatives is critical, the F1 score will obscure the model's true performance on the negative class, making it susceptible to unbalanced class bias. Additionally, the F1 score forces equal importance on precision and recall, which may not match the specific business needs of the classification pipeline.

Key points:

  • The F1 score does not take into account true negatives.
  • Because it ignores true negatives, it is susceptible to unbalanced class bias.
  • It arbitrarily assigns equal weight to precision and recall.

Rubric: The learner should evaluate the situation and diagnose that the F1 score is flawed in this context specifically because it ignores true negatives, thereby leaving the system susceptible to unbalanced class bias.

0

1

Updated 2026-06-07

Contributors are:

Who are from:

Tags

Data Science

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Machine Learning Yearning @ DeepLearning.AI