Concept

Improved Concept Embeddings for Learning Prerequisite Chains: Training set

A corpus of total 7,472 text files, consisting of two parts:

  • LectureBank(Li et al., 2018): a manually-collected dataset of 1352 lecture slide presentations from 60 courses covering 5 domains: Natural Language Processing (nlp), Machine Learning (ml), Artificial Intelligence(ai), Deep Learning (dl), and Information Retrieval (ir).
  • TutorialBank (Fabbri et al., 2018): a manually-collected dataset of over 6000 resources, ranging from HTML pages (.txt) to lecture slides and textbooks (.pdf, .pptx, and .ppt), mainly in the domain of NLP.

0

1

Updated 2020-08-04

Tags

Data Science