Semantic Graphs for Generating Deep Questions
This repository contains code and models for the paper: Semantic Graphs for Generating Deep Questions (ACL 2020). Below is the framework of our proposed model (on the right) together with an input example (on the left).
Requirements
Environment
pytorch 1.4.0
nltk 3.4.4
numpy 1.18.1
tqdm 4.32.2
Data Preprocessing
We release all the datasets below which are processed based on HotpotQA.
-
get tokenized data files of
documents
,questions
,answers
, and the results of Dependency Parsing and Coreference Resolution ondocuments
- get results in folder
text-data
- get results in folder
-
prepare the json files ready as illustrated in
build-semantic-graphs
- get results in folder
json-data
- get results in folder
-
run
scripts/preprocess_data.sh
to get the preprocessed data ready for training-
get results in folder
preprocessed-data
and folderDatasets
-
utilize
glove.840B.300d.txt
from GloVe to initialize the word-embeddings
-
Models
We release both classifier and generator models in this work. The models are constructed based on a sequence-to-sequence architecture. Typically, we use GRU and GNN in the encoder and GRU in the decoder, you can choose other methods (e.g. Transformer) which have also been implemented in our repository.
-
classifier: accuracy - 84.06773%
-
generator: BLeU-4 - 15.28304
Training
-
run
scripts/train_classifier.sh
to train on the Content Selection task -
run
scripts/train_generator.sh
to train on the Question Generation task, the default one is to finetune based on the pretrained classifier
Translating / Testing
- run
scripts/translate.sh
to get the prediction on the validation dataset
Citation
@article{pan2019sgdqg,
title={Semantic Graphs for Generating Deep Questions},
author={Liangming Pan and Yuxi Xie and Yansong Feng and Tat-Seng Chua and Min-Yen Kan},
journal={ACL 2020},
year={2020}
}