How should an ML team handle evaluation when their main metric is found to be untrustworthy?
Case context: An image recognition team discovers that their classification accuracy metric does not reflect actual user satisfaction because it treats all errors equally, including offensive false positives. As a result, the team no longer trusts the metric. Some team members propose manually inspecting and choosing the best models, while others want to establish a new metric.
Question: Evaluate the proposed options and explain what action the team should take, detailing how this action impacts team workflow and goal definition according to Andrew Ng.
Sample answer: The team should not proceed with manually choosing among classifiers. Instead, they should immediately pick a new metric (e.g., a weighted accuracy metric that penalizes offensive errors) and use it to explicitly define a new goal for the team. This ensures the team has a clear, automated target to optimize, rather than wasting time on manual, subjective model selection.
Key points:
- Rejects manual classifier selection as a viable long-term approach.
- Recommends picking a new metric that addresses the issue.
- Uses the new metric to explicitly define a new goal for the team.
Rubric: Evaluation must identify that manual selection should be avoided, and recommends choosing a new metric to explicitly define a new goal.
0
1
Tags
Machine Learning
Deep Learning
Machine Learning Strategy
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Related
When an ML team's evaluation metric is no longer trusted, what does Andrew Ng strongly recommend?
True or False: Andrew Ng considers it acceptable for a team to proceed for an extended period without a trusted metric by manually selecting among classifiers.
When a metric is no longer trusted, Andrew Ng recommends picking a new _____ to explicitly define a new goal for the team.
What does Andrew Ng recommend when a metric is no longer trusted?
It is acceptable to proceed for a long time without a trusted metric by manually choosing among classifiers.
Andrew Ng recommends picking a new _____ to explicitly define a new team goal when the current one is untrustworthy.
Match each concept to its role when a project metric becomes untrustworthy.
Order the steps for recovering from an untrustworthy metric per Andrew Ng's recommendation.
What is the main purpose of using a replacement metric to explicitly define a new team goal?
When a metric becomes untrustworthy, Andrew Ng's recommended first action is to immediately halt all model training.
Ng advises against proceeding for too long without a trusted metric and _____ to manually choosing among classifiers.
Match each team behavior to its consequence when a project metric is untrustworthy.
Order the reasoning steps that justify replacing an untrustworthy metric rather than using manual classifier selection.
How does establishing a new trusted metric prevent issues associated with manually choosing classifiers during model selection?
How should an ML team handle evaluation when their main metric is found to be untrustworthy?
Why is defining a new team goal with a replacement metric preferred over manual classifier selection?