1Cademy - Multiple Hypothesis Testing in Model Evaluation

Learn Before

Test Dataset

Concept

Multiple Hypothesis Testing in Model Evaluation

When evaluating multiple classifiers, such as $f_1, \dots, f_k$ , on the same test dataset, the probability of obtaining a misleading test set performance score for at least one model increases significantly compared to evaluating a single classifier. For a single classifier $f$ , a practitioner might be highly confident that its empirical test error $\epsilon_\mathcal{D}(f)$ is close to its true population error $\epsilon(f)$ . However, as the number of classifiers $k$ grows, the risk of a false discovery compounds, making it difficult to guarantee that the best-performing model did not simply achieve its seemingly low error rate by chance. This phenomenon directly relates to the statistical challenge of multiple hypothesis testing.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

False Discovery in Model Evaluation

Learn Before

Related

Learn After