Activity (Process)

Statistical Protocol for Hierarchical Prerequisite Graph RAG: 5,000 Paired Bootstrap Resamples and Holm–Bonferroni

In this paper, confidence intervals and paired system comparisons on LectureBank-Full, MOOC-CS, and QASC are computed with 5,0005{,}000 paired bootstrap resamples over questions: on each resample, both systems being compared are re-evaluated on exactly the same resampled question indices, and the empirical distribution of per-resample paired deltas yields the 95% percentile CIs reported in the headline tables. To control the family-wise error rate when many system pairs are compared at once, Holm–Bonferroni corrections are applied within comparison families (the set of system pairs reported together in one comparison block), and the corrected results are presented in the supplementary tables rather than in the main headline numbers.

0

1

Updated 2026-05-17

Contributors are:

Who are from:

Tags

Science

Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls