Statistical Protocol for Hierarchical Prerequisite Graph RAG: 5,000 Paired Bootstrap Resamples and Holm–Bonferroni
In this paper, confidence intervals and paired system comparisons on LectureBank-Full, MOOC-CS, and QASC are computed with paired bootstrap resamples over questions: on each resample, both systems being compared are re-evaluated on exactly the same resampled question indices, and the empirical distribution of per-resample paired deltas yields the 95% percentile CIs reported in the headline tables. To control the family-wise error rate when many system pairs are compared at once, Holm–Bonferroni corrections are applied within comparison families (the set of system pairs reported together in one comparison block), and the corrected results are presented in the supplementary tables rather than in the main headline numbers.
0
1
Tags
Science
Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls
Related
Statistical Protocol for Hierarchical Prerequisite Graph RAG: 5,000 Paired Bootstrap Resamples and Holm–Bonferroni
Gate-Only Ablation Result: +1.0 R@10 on LectureBank-Full, Tied on MOOC-CS
Statistical Protocol for Hierarchical Prerequisite Graph RAG: 5,000 Paired Bootstrap Resamples and Holm–Bonferroni
Statistical Protocol for Hierarchical Prerequisite Graph RAG: 5,000 Paired Bootstrap Resamples and Holm–Bonferroni
CPU-Only Latency Protocol (Apple M4 Max, Caches Disabled, 200 Measured Queries After Warm-Up, Three Repeats)
Primary and Secondary Retrieval Metrics in Hierarchical Prerequisite Graph RAG