1Cademy - How should an ML team handle evaluation when their main metric is found to be untrustworthy?

Learn Before

Choosing a New Trusted Metric Instead of Manual Classifier Selection

Case Study

How should an ML team handle evaluation when their main metric is found to be untrustworthy?

Case context: An image recognition team discovers that their classification accuracy metric does not reflect actual user satisfaction because it treats all errors equally, including offensive false positives. As a result, the team no longer trusts the metric. Some team members propose manually inspecting and choosing the best models, while others want to establish a new metric.

Question: Evaluate the proposed options and explain what action the team should take, detailing how this action impacts team workflow and goal definition according to Andrew Ng.

Sample answer: The team should not proceed with manually choosing among classifiers. Instead, they should immediately pick a new metric (e.g., a weighted accuracy metric that penalizes offensive errors) and use it to explicitly define a new goal for the team. This ensures the team has a clear, automated target to optimize, rather than wasting time on manual, subjective model selection.

Key points: