lxucs / coref-hoi Goto Github PK

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

License: Apache License 2.0

Python 98.84% Shell 1.16%

coreference-resolution higher-order nlp pytorch

coref-hoi's Introduction

End-to-End Coreference Resolution with Different Higher-Order Inference Methods

This repository contains the implementation of the paper: Revealing the Myth of Higher-Order Inference in Coreference Resolution.

Architecture

The basic end-to-end coreference model is a PyTorch re-implementation based on the TensorFlow model following similar preprocessing (see this repository).

There are four higher-order inference (HOI) methods experimented: Attended Antecedent, Entity Equalization, Span Clustering, and Cluster Merging. All are included here except for Entity Equalization which is experimented in the equivalent TensorFlow environment (see this separate repository).

Files:

run.py: training and evaluation
model.py: the coreference model
higher_order.py: higher-order inference modules
predict.py: script for prediction on custom input
analyze.py: result analysis
preprocess.py: converting CoNLL files to examples
tensorize.py: tensorizing example
conll.py, metrics.py: same CoNLL-related files from the repository
experiments.conf: different model configurations

Basic Setup

Set up environment and data for training and evaluation:

Install Python3 dependencies: pip install -r requirements.txt
Create a directory for data that will contain all data files, models and log files; set data_dir = /path/to/data/dir in experiments.conf
Prepare dataset (requiring OntoNotes 5.0 corpus): ./setup_data.sh /path/to/ontonotes /path/to/data/dir

For SpanBERT, download the pretrained weights from this repository, and rename it /path/to/data/dir/spanbert_base or /path/to/data/dir/spanbert_large accordingly.

Evaluation

Provided trained models:

SpanBERT + no HOI: FILE
SpanBERT + Attended Antecedent: FILE
SpanBERT + Span Clustering: FILE
SpanBERT + Cluster Merging: FILE
SpanBERT + Entity Equalization: see repository

The name of each directory corresponds with a configuration in experiments.conf. Each directory has two trained models inside.

If you want to use the official evaluator, download and unzip conll 2012 scorer under this directory.

Evaluate a model on the dev/test set:

Download the corresponding model directory and unzip it under data_dir
python evaluate.py [config] [model_id] [gpu_id]
- e.g. Attended Antecedent:python evaluate.py train_spanbert_large_ml0_d2 May08_12-38-29_58000 0

Prediction

Prediction on custom input: see python predict.py -h

Interactive user input: python predict.py --config_name=[config] --model_identifier=[model_id] --gpu_id=[gpu_id]
- E.g. python predict.py --config_name=train_spanbert_large_ml0_d1 --model_identifier=May10_03-28-49_54000 --gpu_id=0
Input from file (jsonlines file of this format): python predict.py --config_name=[config] --model_identifier=[model_id] --gpu_id=[gpu_id] --jsonlines_path=[input_path] --output_path=[output_path]

Training

python run.py [config] [gpu_id]

[config] can be any configuration in experiments.conf
Log file will be saved at your_data_dir/[config]/log_XXX.txt
Models will be saved at your_data_dir/[config]/model_XXX.bin
Tensorboard is available at your_data_dir/tensorboard

Configurations

Some important configurations in experiments.conf:

data_dir: the full path to the directory containing dataset, models, log files
coref_depth and higher_order: controlling the higher-order inference module
bert_pretrained_name_or_path: the name/path of the pretrained BERT model (HuggingFace BERT models)
max_training_sentences: the maximum segments to use when document is too long; for BERT-Large and SpanBERT-Large, set to 3 for 32GB GPU or 2 for 24GB GPU

Citation

@inproceedings{xu-choi-2020-revealing,
    title = "Revealing the Myth of Higher-Order Inference in Coreference Resolution",
    author = "Xu, Liyan  and  Choi, Jinho D.",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.686",
    pages = "8527--8533"
}

coref-hoi's People

Contributors

Stargazers

Watchers

Forkers

li-ming-fan sushantakpani norikinishida sean-blank polm paragdakle dogatekin trendingtechnology mickeysjm ericzxwang quanjiehan dreaminvoker yilunzhu gilmoright joerenner nicholaslea symdec

coref-hoi's Issues

CUDA out of memory error

Hi,

First, I want to thank you so much for your valuable efforts, and this perfectly comprehensible and clean code.

I do not know whether I should ask this here, but I ran into CUDA out of memory error in the evaluation phase (something like this: RuntimeError: CUDA out of memory. Tried to allocate 1.02 GiB (GPU 0; 7.93 GiB total capacity; 4.76 GiB already allocated; 948.81 MiB free; 6.23 GiB reserved in total by PyTorch).

First, I ran into this error in the training phase. I reduced the size of some parameters in the experiments.conf file, which I think would help to reduce the GPU usage and they did because I am now able to pass the training phase. However, this error appears in the evaluation phase no matter how much I decrease the parameters like span width, max_sentence_len, or the ffnn size. I wonder if you had the same problem or do you have any suggestions for me.

I am currently using GeForce GTX 1080 with 8GB memory.

Many thanks,
Arad

License

The repo does not contain any license specification. It would be great if you could license it explicitly under a FOSS license so that further research can build upon this great code!
Personally I'd suggest the MIT license but a Apache or a GPL variety could also be a great choice.

Most of these licenses require attribution in source code distributions so you would have to be credited (as you should be 😃).

Train on spanbert large, but get F1 1 point lower than presented in paprer

Hi,

I use spanbert large model with default parameters in config file, and I get Avg F1 78.27, lower than Avg.F1 79.9 in paper.
config as following:

num_docs = 2802
bert_learning_rate = 1e-05
task_learning_rate = 0.0003
max_segment_len = 512
ffnn_size = 3000
cluster_ffnn_size = 3000
max_training_sentences = 3
bert_tokenizer_name = bert-base-cased

max_top_antecedents = 50
max_training_sentences = 5
top_span_ratio = 0.4
max_num_extracted_spans = 3900
max_num_speakers = 20
max_segment_len = 256

Learning

bert_learning_rate = 1e-5
task_learning_rate = 2e-4
loss_type = marginalized # {marginalized, hinge}
mention_loss_coef = 0
false_new_delta = 1.5 # For loss_type = hinge
adam_eps = 1e-6
adam_weight_decay = 1e-2
warmup_ratio = 0.1
max_grad_norm = 1 # Set 0 to disable clipping
gradient_accumulation_steps = 1

Model hyperparameters.

coref_depth = 1 # when 1: no higher order (except for cluster_merging)
higher_order = attended_antecedent # {attended_antecedent, max_antecedent, entity_equalization, span_clustering, cluster_merging}
coarse_to_fine = true
fine_grained = true
dropout_rate = 0.3
ffnn_size = 1000
ffnn_depth = 1
cluster_ffnn_size = 1000 # For cluster_merging
cluster_reduce = mean # For cluster_merging
easy_cluster_first = false # For cluster_merging
cluster_dloss = false # cluster_merging
num_epochs = 24
feature_emb_size = 20
max_span_width = 30
use_metadata = true
use_features = true
use_segment_distance = true
model_heads = true
use_width_prior = true # For mention score
use_distance_prior = true # For mention-ranking score

Other.

conll_eval_path = dev.english.v4_gold_conll # gold_conll file for dev
conll_test_path = test.english.v4_gold_conll # gold_conll file for test
genres = ["bc", "bn", "mz", "nw", "pt", "tc", "wb"]
eval_frequency = 1000
report_frequency = 100

train on bert base

Hello, I'd to know how about the result of this model training on Bert_base? I have trianed on bert base with c2f , python run.py train_bert_base_ml0_d2, but only get a result about 67 F1

Data Set up issue in Basic Set up

Install Python3 dependencies: pip install -r requirements.txt
Create a directory for data that will contain all data files, models and log files; set data_dir = /path/to/data/dir in experiments.conf

After step 1 and 2 I tried step 3 of the Basic setup

Prepare dataset (requiring OntoNotes 5.0 corpus): ./setup_data.sh /path/to/ontonotes /path/to/data/dir

.
.

reference-coreference-scorers/v8.01/test/DataFiles/TC-N.key
reference-coreference-scorers/v8.01/test/test.pl
reference-coreference-scorers/v8.01/test/TestCases.README
bash: conll-2012/v3/scripts/skeleton2conll.sh: No such file or directory

Though there exists a coref_hoi/data/dir/conll-2012/v3/scripts/skeleton2conll.sh file.
Do I need to change any other file prior to running setup_data.sh ?

ValueError when predicting

All the data and models required have been downloaded into proper path.

Trying to run predict.py with command:
python predict.py --config_name=train_spanbert_large_ml0_d2 --model_identifier=May08_12-38-29_58000 --gpu_id=0
and encounter ValueError:

Traceback (most recent call last):
File "predict.py", line 71, in
nlp.add_pipe(nlp.create_pipe('sentencizer'))
File "/home/qliu/anaconda3/envs/e2e/lib/python3.6/site-packages/spacy/language.py", line 754, in add_pipe
raise ValueError(err)
ValueError: [E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy.pipeline.sentencizer.Sentencizer object at 0x7f7fabe3f288> (name: 'None').

If you created your component with nlp.create_pipe('name'): remove nlp.create_pipe and call nlp.add_pipe('name') instead.

If you passed in a component like TextCategorizer(): call nlp.add_pipe with the string name instead, e.g. nlp.add_pipe('textcat').

If you're using a custom component: Add the decorator @Language.component (for function components) or @Language.factory (for class components / factories) to your custom component and assign it a name, e.g. @Language.component('your_name'). You can then run nlp.add_pipe('your_name') to add it to the pipeline.

trained weights for base

Good work.
I only see weights for large, could you also provider weights for base? That will be much easy to handle for debugging.
Thanks.

Custom training data for coref-hoi

Hi all,
I was wondering if it is possible to use custom data that one can prepare themselves for training this model. If so, how does one do this with coref-hoi? Will it convert a txt file to the right format or does one have to convert it to a ConLL file first? Can it be ConLL-U? Thank you very much.

Running on our own CoNLL-U files

I have been able to run the models on the OntoNotes test set, but how do we get predictions for our own CoNLL-U files?

Training issue: with bert_base

Hi @lxucs,

I want to train a model for bert_base with no HOI like the spanbert_large_ml0_d1 model

python run.py bert_base 0

Got this issue:

Traceback (most recent call last):
File "run.py", line 289, in
model = runner.initialize_model()
File "run.py", line 51, in initialize_model
model = CorefModel(self.config, self.device)
File "/VL/space/sushantakp/research_work/coref-hoi/model.py", line 33, in init
self.bert = BertModel.from_pretrained(config['bert_pretrained_name_or_path'])
File "/VL/space/sushantakp/.conda/envs/skp_env376/lib/python3.7/site-packages/transformers/modeling_utils.py", line 935, in from_pretrained
raise EnvironmentError(msg)
OSError: Can't load weights for 'bert-base-cased'. Make sure that:

'bert-base-cased' is a correct model identifier listed on 'https://huggingface.co/models'
or 'bert-base-cased' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.

Is it needed to change any parameter in experiments.conf ?

To handle above issue
to train with HOI/ No HOI

which checkpoint of the trained weights should I use?

Hi lxucs,
There are 2 checkpoint of the trained weights, which one is the one used in your paper?
Thanks

Below is an example:

train_spanbert_large_ml0_cm_fn1000_max_dloss/model_May14_05-15-38_63000.bin
train_spanbert_large_ml0_cm_fn1000_max_dloss/model_May22_23-31-16_66000.bin

Preprocess - Split into segments function

Hi again Liyan,

I had some brief questions regarding splitting documents into segments. I think the segments contain more than one sentence (based on the split_into_segments function in the preprocess.py file). Was not it be better if segments contain one sentence at last? I could not see the intuition behind it. Is it better to have longer segments or it is for having more efficient use of resources? or Is it practically tested and the trained model gained better accuracy this way?

Thanks,
Arad

How to analyse the result of a model?

Hi @lxucs
Please share brief information about the use of analyze.py.