Example

HotpotQA External-Validity Probe: Adaptive Depth Does Not Transfer to a Denser Non-Prerequisite Graph (FullWiki-1k: Flat 93.4 / Hier 92.9 / Adaptive 94.0 R@10)

On the aligned HotpotQA FullWiki-1k boundary-probe slice, the three strict-parity rows are essentially indistinguishable at R@10: Flat dense = 93.4, Hierarchical baseline = 92.9, Adaptive (heuristic) = 94.0. The auxiliary probe does not reproduce a distinctive adaptive-depth advantage, indicating that the curated prerequisite ranking does not transfer cleanly to a denser, non-prerequisite graph. The probe therefore bounds the external validity of the paper's adaptive-depth story to curated, template-based prerequisite retrieval rather than open-domain multi-hop retrieval.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

Science

Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls

Related