Case Study

Diagnose the methodological error in a team's test set evaluation process.

Case context: A machine learning team evaluates their speech recognition model on the test set once a month to monitor long-term progress. During the most recent monthly evaluation, they noticed that the model's accuracy on the test set dropped slightly compared to the previous month. The project manager immediately ordered the team to reject the new model and roll back to the previous month's version of the system.

Question: Identify the methodological error committed by the team in this scenario. What are the specific long-term consequences of this decision on the reliability of their evaluation metrics?

Sample answer: The methodological error was using the test set to make an algorithm decision (the decision to roll back to a previous version). This practice leads to overfitting to the test set. As a result, the test set can no longer provide a completely unbiased estimate of the system's actual performance.

Key points:

  • The team used test set performance to decide on an algorithmic rollback.
  • This decision process causes the system to overfit to the test set.
  • The test set is no longer reliable for providing an unbiased estimate of performance.

Rubric: The learner must diagnose that the team used the test set for algorithm decisions (rolling back the system). They must explain that this causes overfitting and invalidates the test set's capacity to offer an unbiased estimate of system performance.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Machine Learning Strategy

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Related