Example

QASC Generation Diagnostic: TF-IDF Multiple-Choice Scorer 76.8% (Hierarchical) vs 74.6% (Adaptive)

On QASC, a deterministic TF-IDF multiple-choice scorer (no LLM judge) is used as a boundary-condition generation diagnostic. With the hierarchical baseline retrieved context the scorer reaches 76.8%76.8\% accuracy, and with adaptive retrieved context it reaches 74.6%74.6\%. The QASC generation check is interpreted only as a boundary-condition diagnostic of whether retrieved contexts preserve answerable evidence, not as a headline generation claim. The choice of a deterministic TF-IDF scorer keeps the diagnostic reproducible and independent of LLM-judge bias.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

Science

Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls