Learn Before
Concept

Multiple Hypothesis Testing in Model Evaluation

When evaluating multiple classifiers, such as f1,,fkf_1, \dots, f_k, on the same test dataset, the probability of obtaining a misleading test set performance score for at least one model increases significantly compared to evaluating a single classifier. For a single classifier ff, a practitioner might be highly confident that its empirical test error ϵD(f)\epsilon_\mathcal{D}(f) is close to its true population error ϵ(f)\epsilon(f). However, as the number of classifiers kk grows, the risk of a false discovery compounds, making it difficult to guarantee that the best-performing model did not simply achieve its seemingly low error rate by chance. This phenomenon directly relates to the statistical challenge of multiple hypothesis testing.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L