Learn Before
Retrieval-Only Scope of Headline Claims
Analysis And Limitations in Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls
Method-Related Scope Notes for Adaptive Depth Gating for Prerequisite Retrieval (Auditable Strict-Parity Graph-RAG Paper)
HotpotQA Multi-Hop QA Benchmark
HotpotQA External-Validity Probe: Adaptive Depth Does Not Transfer to a Denser Non-Prerequisite Graph (FullWiki-1k: Flat 93.4 / Hier 92.9 / Adaptive 94.0 R@10)
On the aligned HotpotQA FullWiki-1k boundary-probe slice, the three strict-parity rows are essentially indistinguishable at R@10: Flat dense = 93.4, Hierarchical baseline = 92.9, Adaptive (heuristic) = 94.0. The auxiliary probe does not reproduce a distinctive adaptive-depth advantage, indicating that the curated prerequisite ranking does not transfer cleanly to a denser, non-prerequisite graph. The probe therefore bounds the external validity of the paper's adaptive-depth story to curated, template-based prerequisite retrieval rather than open-domain multi-hop retrieval.
0
1
Tags
Science
Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls
Related
Generation as Context-Quality Diagnostic, Not a Headline Claim
Claims Explicitly Avoided: Auto-Judge Validation, Semantic-Evasion Effects, End-to-End QA Gains
Qualitative Manual Bundles Remain Non-Evidentiary Until Multi-Annotator Replication
Analysis-Section Scope Statement: Evidence Specific to Curated, Template-Based Prerequisite QA
HotpotQA External-Validity Probe: Adaptive Depth Does Not Transfer to a Denser Non-Prerequisite Graph (FullWiki-1k: Flat 93.4 / Hier 92.9 / Adaptive 94.0 R@10)
LectureBank-Full Decomposition: Diffusion+Quotas Drive ~18 R@10 Points; Contrast Gating Adds At Most ~1 Point (Statistically Tied)
Bounded Benchmark Validity: Two Question Families and 21/18 Unique Held-Out Targets Cap Statistical Power
Restrained Claim Scope: No Automatic-Judge Validation, No Semantic-Evasion Claim, No Robust End-to-End QA Gains
Language-Matched Seeding as a Prerequisite for Graph-Expansion Gains
HotpotQA External-Validity Probe: Adaptive Depth Does Not Transfer to a Denser Non-Prerequisite Graph (FullWiki-1k: Flat 93.4 / Hier 92.9 / Adaptive 94.0 R@10)
LectureBank-Full Tight-Budget Advantage of Adaptive Depth Gating (Mean ΔR@k = +2.13 over k∈{1,2,3,4})
When Adaptive Depth Helps vs When Fixed-Depth Is Safer (Budget/Cap Guidance)
Three Durable Signals of Strict-Parity Prerequisite Retrieval
Adaptive Depth Gating Helps Only in Tight-k Settings on Headline Prerequisite Benchmarks
HotpotQA External-Validity Probe: Adaptive Depth Does Not Transfer to a Denser Non-Prerequisite Graph (FullWiki-1k: Flat 93.4 / Hier 92.9 / Adaptive 94.0 R@10)
Induced HotpotQA Slice as Auxiliary Stress Test in Supplement
HotpotQA FullWiki-1k Boundary Probe: Flat 93.4, Hierarchical 92.9, Adaptive 94.0 R@10
HotpotQA External-Validity Probe: Adaptive Depth Does Not Transfer to a Denser Non-Prerequisite Graph (FullWiki-1k: Flat 93.4 / Hier 92.9 / Adaptive 94.0 R@10)