Learn Before
QASC Question Answering Benchmark
ColBERTv2/RePlug Reranking Baseline for Strict-Parity Prerequisite Retrieval
QASC Validation: Reranking Remains Stronger Than Either Hierarchical Or Adaptive Traversal (Results) in Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls
QASC Strict-Parity Result: ColBERTv2/RePlug Strongest (R@10 = 85.0 [83.4, 86.6])
On the QASC validation split (), the strongest strict-parity retrieval system the paper reports is ColBERTv2/RePlug at R@ (95% paired-bootstrap CI ). Because ColBERTv2/RePlug adds an extra learned reranker on top of the matched dense interface, it is reported under the SP+ type rather than the pure strict-parity SP type, so its headline number is interpreted alongside, not directly against, the SP systems. Strict parity still holds the encoder, candidate pool, cutoff , matching rule, and split policy fixed across systems, so the comparison isolates the retriever + reranking pipeline.
0
1
Tags
Science
Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls
Related
QASC Directed Science Fact Graph Reconstruction (16,444 Nodes, 25,590 Edges)
QASC Strict-Parity Result: ColBERTv2/RePlug Strongest (R@10 = 85.0 [83.4, 86.6])
QASC Generation Diagnostic: TF-IDF Multiple-Choice Scorer 76.8% (Hierarchical) vs 74.6% (Adaptive)
QASC Conclusion: Reranking Beats Hierarchical and Adaptive Graph Traversal
QASC Paired Delta: Adaptive vs Hierarchical Baseline = +0.5 [-0.5, +1.5]
QASC Strict-Parity Result: ColBERTv2/RePlug Strongest (R@10 = 85.0 [83.4, 86.6])
QASC Paired Delta: Adaptive vs Hierarchical Baseline = +0.5 [-0.5, +1.5]
QASC Strict-Parity Result: ColBERTv2/RePlug Strongest (R@10 = 85.0 [83.4, 86.6])