Generation as Context-Quality Diagnostic, Not a Headline Claim
In this paper, end-to-end generation is retained only as a diagnostic of context quality, not as a headline claim. Generation numbers are reported to indicate whether retrieved contexts preserve answerable evidence and how reference style affects surface-form metrics, but they are explicitly excluded from the strict-parity headline retrieval comparisons. This scoping decision is what allows the paper to read low EM/ROUGE-L/BLEU values as artifacts of reference style rather than as evidence against the underlying retrieval system.
0
1
Tags
Science
Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls
Related
Generation as Context-Quality Diagnostic, Not a Headline Claim
Claims Explicitly Avoided: Auto-Judge Validation, Semantic-Evasion Effects, End-to-End QA Gains
Qualitative Manual Bundles Remain Non-Evidentiary Until Multi-Annotator Replication
Analysis-Section Scope Statement: Evidence Specific to Curated, Template-Based Prerequisite QA
HotpotQA External-Validity Probe: Adaptive Depth Does Not Transfer to a Denser Non-Prerequisite Graph (FullWiki-1k: Flat 93.4 / Hier 92.9 / Adaptive 94.0 R@10)
Generation as Context-Quality Diagnostic, Not a Headline Claim