Case Study

How should an ML team handle evaluation when their main metric is found to be untrustworthy?

Case context: An image recognition team discovers that their classification accuracy metric does not reflect actual user satisfaction because it treats all errors equally, including offensive false positives. As a result, the team no longer trusts the metric. Some team members propose manually inspecting and choosing the best models, while others want to establish a new metric.

Question: Evaluate the proposed options and explain what action the team should take, detailing how this action impacts team workflow and goal definition according to Andrew Ng.

Sample answer: The team should not proceed with manually choosing among classifiers. Instead, they should immediately pick a new metric (e.g., a weighted accuracy metric that penalizes offensive errors) and use it to explicitly define a new goal for the team. This ensures the team has a clear, automated target to optimize, rather than wasting time on manual, subjective model selection.

Key points:

  • Rejects manual classifier selection as a viable long-term approach.
  • Recommends picking a new metric that addresses the issue.
  • Uses the new metric to explicitly define a new goal for the team.

Rubric: Evaluation must identify that manual selection should be avoided, and recommends choosing a new metric to explicitly define a new goal.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Machine Learning Strategy

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Related