The ede from morningmoni

The repo provides the code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021 and "Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation" arXiv 2020

Code

[update] Code for Constrained Abstractive Summarization is in folder CAS/

The code is based on the seq2seq examples of huggingface transformers (3.0 <= version < 4.0)

The most important files are as follows:

DDBA.py: core functions for constrained generation, including a PyTorch implementation of DBA [1], adapted from the official MXNet implementation

finetune.py: model training

run_eval.py: model inference with or without constraints

transformers_local/generation_utils.py: modified the functions related to model decoding for enforcing constraints

transformers_local/modeling_bart.py: implemented BART+copy mechanism and other side functions

[1] "Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation", NAACL 2018

Failing to reproduce paper results

Hello!

I have read the paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" and was really interested by it. I have tried to recreate the results on for the Question Generation on SQuAD dataset, but failed. My Rouge-L score for DBA is 13.4818 and for DDBA is 9.5297. Clearly I've done something dramatically wrong and I would appreciate your help. Here are all the steps I've done:

I've downloaded SQuAD dataset here. Then I've separated source input ant target output into train/val/test.source/target files. The train file contains the whole training set. The val and test files are identical and contain dev set from the SQuAD website. The examples can be found here.
I've run python finetune.py. I did not modify finetune.py or conf.py. The code completed successfully and saved all the checkpoints.
To test the pipeline I've started with using simple spacy-generated constraints. In the paper they are referred as "gold constraints". I have used en_core_web_sm spacy model to extract entities referring to the example here. The results were placed in a constraint_kpe_em.json file. You can check it here.
Finally, for the evaluation I've run python run_eval.py and python run_eval.py --partial True to get DBA and DDBA scores, respectively. I did not change anything in the run_eval.py file. The scores came out low and were already mentioned above.

I am now working on Automatic constraint generation and trying to apply this repo to SQuAD dataset. Am I correct, that in your repo you are using this code to create constraints? Yet I couldn't figure out how to apply it on SQuAD, though.

However, given the low scores, I have a feeling that there's also something that I could do wrong in the steps described above. Maybe, in the paper some special hyperparameters (different from default ones) were used for the Question Generation task? Could you please help me figure out what's wrong or suggest what steps to take in order to get better scores?

morningmoni / ede Goto Github PK

ede's Introduction

Code

ede's People

Contributors

Stargazers

Watchers

Forkers

ede's Issues

Failing to reproduce paper results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent