sheffieldnlp / naacl2018-fever Goto Github PK

Fact Extraction and VERification baseline published in NAACL2018

License: Apache License 2.0

Python 98.57% Shell 0.97% Dockerfile 0.47%

pytorch pytorch-implmention evidence-retrieval information-retrieval information-extraction fever verification baseline evaluation wikipedia

naacl2018-fever's People

Contributors

Stargazers

Watchers

naacl2018-fever's Issues

Test MLP as 2-way classification task

Pre-trained model.tar.gz Not Available

I notice that https://jamesthorne.co/fever/model.tar.gz is not responding. I can not download the pre-trained model. Are there any other sites that I can download it?

Thank you.

problems with model 2 from readme

some errors in tokenizer used in allennlp==2.1.0 is this the right version in requirements?
missleading text in README: model 2 is only running on GPU because of allennlp if I am right. Specification makes no sense

#if using a CPU, set
export CUDA_DEVICE=-1

#if using a GPU, set
export CUDA_DEVICE=0 #or cuda device id

missleading title of model? This model is not a LSTM model I think (Parikh et al 2016) (same in fever dataset paper)

Final Evaluation/Scoring for AllenNLP modules

Train BiDAF on sentence spans given claims

Rationalise Scripts and Run Final Experiments

To run

MLP: Train on FNC, Evaluate on FNC, Evaluate on FEVER 3 way
MLP: Train on FEVER with sampled negative pages, Test
MLP: Train on FEVER with IR negative pages, Test
DR: Final score for recall/precision/MRR
DR: Score using Oracle RTE component
RTE: Pre-trained model, evaluate on FEVER
RTE: Train on FEVER bodies, evaluate on FEVER

Extra:

BiDAF: Precision/Recall of pretrained model
BiDAF: FEVER Accuracy using pretrained model on DRQA Pages
RTE: Train on BiDAF retrieved model: evaluate P/R of BiDAF. Evaluate FEVER score

Evaluation speed

How long is evaluation supposed to take? I'm running the evidence retrieval step on 8 vCPUs , 16 GB RAM and an SSD, and for the dev set its projcecting almost 10 hours?

Is this expected?

Per claim recall - no partial points. 1 if all evidence found. 0 if only partial information

Do not give any partial credit if only partial set of documents is returned.

NameError: name 'get_count_matrix' is not defined

I see this after cloning the latest version.
Traceback (most recent call last):
File "src/scripts/build_tfidf.py", line 34, in
count_matrix, doc_dict = get_count_matrix(
NameError: name 'get_count_matrix' is not defined
Looking at drqascripts.retriever.build_tfidf:
class TfIdfBuilder(builtins.object):
get_count_matrix(self)

But it's called as a static class in src/scripts/build_tfidf.py

Specify CUDA device override command line option

Sentence selection with TF-IDF metric

MLP training will crash if models directory doesn't exist

TF-IDF flaw?

Played with the interactive mode, and observed the following behaviour: documents retrieved lack are relevant only to some part of the claim,. e.g.:

"The earth is round" gets songs like "round and round"
"Greece has 11 million people." is refuted by pages that are not about Greece, but millions of people
"Alcohol causes cancer" gets nothing about alcohol

I am wondering whether:

a simpler check that tries to get as many words (possibly avoiding stopwords) from the claims found in the sentence to be in the documents. Maybe a variant of ROUGE?
an alternative would be to train a document ranker, but maybe that wouldn't be part of the "baseline" approach.

Evidence Retrieval Evaluation being killed

MLP Model Training has some stochasticity - double check seeds etc.

Dataset summary statistics for paper

Label distribution.
Claim length

Read Lines from Wiki page into sqlite db

Learning Rate Schedule

Cannot find "OnlineTfidfDocRanker"

Hi,
Inside the "process_tfidf_drqa.py" file, there is a line trying to import OnlineTfidfDocRanker (i.e. "from drqascripts.retriever.build_tfidf_lines import OnlineTfidfDocRanker"), however, i cannot locate this file within the provided codebase. Can I know where i can find this file?

Got TypeError: unhashable type: 'list' when running eval_mrr.py

I got the error

Traceback (most recent call last):
File "src/scripts/retrieval/document/eval_mrr.py", line 22, in
evidence = set([t[1] for t in js["evidence"] if isinstance(t,list) and len(t)>1])
TypeError: unhashable type: 'list'

when running

python src/scripts/retrieval/document/eval_mrr.py --split dev --count 5

It seems that the t[1] can be a list, but isn't it supposed to be a string type?

pytorch 0.3.1 seems too outdated to install

Hi,
I faced an error when tried to build the Dockerfile:

command:

$ sudo docker build .

error:

Step 21/23 : RUN conda install -y pytorch=0.3.1 torchvision -c pytorch
...
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/pytorch/linux-64/pytorch-0.3.1-py36_cuda8.0.61_cudnn7.1.2_3.tar.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

Cheers,

Some training data files missing

When trying to train a model I get an error:

FileNotFoundError: [Errno 2] No such file or directory: 'data/fever/train.ns.pages.p1.jsonl'

I believe in the instructions some part is missing - maybe running the script src/scripts/dataset/download_dataset.py with appropriate parameters to get the needed files?

Thanks,
Slavko

Dockerize FEVER

Sample some of dataset and check it is constant with guidelines

In particular:
Use of common sense/world knowledge

Generate baseline train/dev/test splits

3k labels from each class to create balanced dev/test

Builddb.py will fail if the fever data directory doesn't exist

Get error in the initialization regex

When I ran src/scripts/rte/da/eval_da.py, it shows that "Did not use initialization regex that was passed: .*token_embedder_tokens\._projection.*weight". And my allennlp version is 0.2.3. So how to solve this error?

MLP model training crashes if models directory doesnt exist

Key Error when running drQA

When running the following command:

PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5

The new version of the code gives me a key error.

The exception comes from the line (from drqascripts/retriever/build_tfidf.py in function count)
col.extend([DOC2IDX[doc_id]] * len(counts))

I do not get this error if i set the flag --parallel to be false. Not too sure but I am guessing that the issue lies in the multiprocessing part of the code.

Thanks for your help!

Textual Entailment on entire retrieved document using pretrained model with decomposable attention

Retrieve relevant pages given page using DrQA

Confusion matrix and scoring on dev/test set predictions

Evaluate RTE model trained on FNC using on Fever Data

Scripts to sample data for review

Use different tokenizer for claims to wikipedia articles

Wikipedia articles are pre-tokenized and just need splitting by space. Claims need to be tokenized properly

TF no IDF

Upper bound for eval_p recall on k=1,5,10,100

Documentation for baseline models

Textual Entailment using Bejamin Riedel's model

Error Analysis

how often did DR return the right page?
how often did SR return the right page?
how often did SR return the original evidence?
for the times where SR returned different evidence. What are the differences between BLEU/ROUGE similarities between the claim and returned evidence vs claim and gold evidence?
Error coding scheme

Logging in builddb script

Currently the builddb script uses facebook's logging - we should use FEVER's logging

DrQA on Interactive Mode

installation fails with pip 10.0.1

Building wheels for collected packages: drqa, fever-scorer, drqa, fever-scorer
Running setup.py bdist_wheel for drqa ... done
Stored in directory: /tmp/pip-ephem-wheel-cache-cfhla23v/wheels/25/a8/71/6390f88d8b3ecda4c32998985670851ed7281bfa8ced27196e
Running setup.py bdist_wheel for fever-scorer ... done
Stored in directory: /tmp/pip-ephem-wheel-cache-cfhla23v/wheels/e0/c6/1a/8ff7f96802122bf337bfc8e05852f7d5618a6cffc95b5ee624
Running setup.py bdist_wheel for drqa ... error
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/drqa/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/pip-wheel-5hao0iio --python-tag cp36:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/drqa/setup.py'

Failed building wheel for drqa
Running setup.py clean for drqa
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/drqa/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" clean --all:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/drqa/setup.py'

Failed cleaning build dir for drqa
Running setup.py bdist_wheel for fever-scorer ... error
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/pip-wheel-3rmxr6dz --python-tag cp36:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py'

Failed building wheel for fever-scorer
Running setup.py clean for fever-scorer
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" clean --all:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py'

Failed cleaning build dir for fever-scorer

Evidence Picking train own BiDAF.

Implement Early Stopping

ImportError: cannot import name 'Dataset'

When I trained the Decomposable Attention model, I got the bug:

File "src/scripts/rte/da/train_da.py", line 9, in <module>
    from allennlp.data import Vocabulary, Dataset, DataIterator, DatasetReader, Tokenizer, TokenIndexer
ImportError: cannot import name 'Dataset'

I installed allennlp from source and when I looked into the folder data, there's nothing called Dataset. Does anybody know how to fix this problem? Thanks!

Failed to build DrQA

Never mind, it looks like just permission issue.

My hunch is that the baseline uses a specifical version of drqa other than the one we can find in the drqa's repo. However, when I try to install it with the following command, as specified in the requirements:

pip3 install git+git://github.com/j6mes/drqa@fever#egg=DrQA-0.1.3

I got the error:

Building wheels for collected packages: drqa, drqa
  Running setup.py bdist_wheel for drqa ... done
  Stored in directory: /private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-ephem-wheel-cache-ok8ikj6r/wheels/2a/62/41/ddc1e0efc8a4f3becd45012e6624752e2fe5fbf733a5b61d3a
  Running setup.py bdist_wheel for drqa ... error
  Complete output from command /usr/local/opt/python3/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-wheel-t6ci4qrb --python-tag cp36:
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tokenize.py", line 452, in open
      buffer = _builtin_open(filename, 'rb')
  FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py'
  
  ----------------------------------------
  Failed building wheel for drqa
  Running setup.py clean for drqa
  Complete output from command /usr/local/opt/python3/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" clean --all:
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tokenize.py", line 452, in open
      buffer = _builtin_open(filename, 'rb')
  FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py'
  
  ----------------------------------------
  Failed cleaning build dir for drqa
Successfully built drqa
Failed to build drqa

Can somebody help please? Thank you!

sheffieldnlp / naacl2018-fever Goto Github PK

naacl2018-fever's People

Contributors

Stargazers

Watchers

Forkers

naacl2018-fever's Issues

Never mind, it looks like just permission issue.

Recommend Projects

Recommend Topics

Recommend Org