1Cademy - QASC Generation Diagnostic: TF-IDF Multiple-Choice Scorer 76.8% (Hierarchical) vs 74.6% (Adaptive)

Learn Before

Example

QASC Generation Diagnostic: TF-IDF Multiple-Choice Scorer 76.8% (Hierarchical) vs 74.6% (Adaptive)

On QASC, a deterministic TF-IDF multiple-choice scorer (no LLM judge) is used as a boundary-condition generation diagnostic. With the hierarchical baseline retrieved context the scorer reaches 76.8% accuracy, and with adaptive retrieved context it reaches 74.6%. The QASC generation check is interpreted only as a boundary-condition diagnostic of whether retrieved contexts preserve answerable evidence, not as a headline generation claim. The choice of a deterministic TF-IDF scorer keeps the diagnostic reproducible and independent of LLM-judge bias.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

References

Reference: Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls

Learn Before

Related