Release 1, 2018-03-23
This is data used in "Concept Hierarchy Extraction from Textbooks" (Wang et al., 2015)
This dataset contains two parts:
- original book content and wikipedia dictionary as described in the paper
- key concepts label
For book content and wiki dictionary:
- A wiki dictionary {bookname}.wikis is collected by running a BFS crawler on wiki anchor network (as described in the paper)
- A clean wiki dictioanry {bookname}.wikis_clean is generated by filtering categories that has too few pages in the dictionary (as describe in section3.1 in the paper)
- Each file under folder {bookname}_content contains content for each book chapter (each chapter in a seperate file)
For key concepts label:
- {bookname}_vote_final.csv (separated by semicolons)
- col0: chapter number
- col1: chapter title
- col2: if a concept is a key concept in this chapter; If this column is "1" or "2", this wiki concept is a key concept; Otherwise, it is not
Please cite the following paper if you use this data.
@inproceedings{wang2015concept,
title={Concept hierarchy extraction from textbooks},
author={Wang, Shuting and Liang, Chen and Wu, Zhaohui and Williams, Kyle and Pursel, Bart and Brautigam, Benjamin and Saul, Sherwyn and Williams, Hannah and Bowen, Kyle and Giles, C Lee},
booktitle={Proceedings of the 2015 ACM Symposium on Document Engineering},
pages={147--156},
year={2015},
organization={ACM}
}
If you have any problems, please contact Shuting Wang at [email protected].
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.