1Cademy - Canonical Prerequisite Splits Are Heavily Templated: 92/80 LectureBank-Full and 68/60 MOOC-CS Train-Test Overlaps

Learn Before

Method Part 2: Question-Disjoint and Target-Concept-Disjoint (Auditable Strict-Parity Graph-RAG Paper)
Datasets (Experimental Setup) in Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls

Example

Canonical Prerequisite Splits Are Heavily Templated: 92/80 LectureBank-Full and 68/60 MOOC-CS Train-Test Overlaps

The canonical prerequisite splits used to train and evaluate prerequisite-QA systems are heavily templated, producing measurable train-test leakage. On LectureBank-Full, the canonical splits share $92$ exact train-test questions and $80$ shared train-test target concepts. On MOOC-CS, the canonical splits share $68$ exact train-test questions and $60$ shared train-test target concepts. These overlap counts are what motivate the paper's question-disjoint and target-concept-disjoint controls: without them, headline retrieval numbers on the canonical splits would partially reflect train-test template reuse rather than generalization.

0

1

Updated 2026-05-17

Contributors are:

Who are from:

References

Reference: Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls

Learn Before

Related