Giter VIP home page Giter VIP logo

sheffieldnlp / naacl2018-fever Goto Github PK

View Code? Open in Web Editor NEW
127.0 12.0 41.0 1.04 MB

Fact Extraction and VERification baseline published in NAACL2018

Home Page: http://fever.ai

License: Apache License 2.0

Python 98.57% Shell 0.97% Dockerfile 0.47%
pytorch pytorch-implmention evidence-retrieval information-retrieval information-extraction fever verification baseline evaluation wikipedia

naacl2018-fever's People

Contributors

andreasvlachos avatar drevicko avatar gruentee avatar j6mes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

naacl2018-fever's Issues

problems with model 2 from readme

  • some errors in tokenizer used in allennlp==2.1.0 is this the right version in requirements?
  • missleading text in README: model 2 is only running on GPU because of allennlp if I am right. Specification makes no sense

#if using a CPU, set
export CUDA_DEVICE=-1

#if using a GPU, set
export CUDA_DEVICE=0 #or cuda device id

  • missleading title of model? This model is not a LSTM model I think (Parikh et al 2016) (same in fever dataset paper)

Rationalise Scripts and Run Final Experiments

To run

  • MLP: Train on FNC, Evaluate on FNC, Evaluate on FEVER 3 way
  • MLP: Train on FEVER with sampled negative pages, Test
  • MLP: Train on FEVER with IR negative pages, Test
  • DR: Final score for recall/precision/MRR
  • DR: Score using Oracle RTE component
  • RTE: Pre-trained model, evaluate on FEVER
  • RTE: Train on FEVER bodies, evaluate on FEVER

Extra:

  • BiDAF: Precision/Recall of pretrained model
  • BiDAF: FEVER Accuracy using pretrained model on DRQA Pages
  • RTE: Train on BiDAF retrieved model: evaluate P/R of BiDAF. Evaluate FEVER score

Evaluation speed

How long is evaluation supposed to take? I'm running the evidence retrieval step on 8 vCPUs , 16 GB RAM and an SSD, and for the dev set its projcecting almost 10 hours?

Is this expected?

NameError: name 'get_count_matrix' is not defined

I see this after cloning the latest version.
Traceback (most recent call last):
File "src/scripts/build_tfidf.py", line 34, in
count_matrix, doc_dict = get_count_matrix(
NameError: name 'get_count_matrix' is not defined
Looking at drqascripts.retriever.build_tfidf:
class TfIdfBuilder(builtins.object):
get_count_matrix(self)

But it's called as a static class in src/scripts/build_tfidf.py

TF-IDF flaw?

Played with the interactive mode, and observed the following behaviour: documents retrieved lack are relevant only to some part of the claim,. e.g.:

  • "The earth is round" gets songs like "round and round"
  • "Greece has 11 million people." is refuted by pages that are not about Greece, but millions of people
  • "Alcohol causes cancer" gets nothing about alcohol

I am wondering whether:

  • a simpler check that tries to get as many words (possibly avoiding stopwords) from the claims found in the sentence to be in the documents. Maybe a variant of ROUGE?
  • an alternative would be to train a document ranker, but maybe that wouldn't be part of the "baseline" approach.

Cannot find "OnlineTfidfDocRanker"

Hi,
Inside the "process_tfidf_drqa.py" file, there is a line trying to import OnlineTfidfDocRanker (i.e. "from drqascripts.retriever.build_tfidf_lines import OnlineTfidfDocRanker"), however, i cannot locate this file within the provided codebase. Can I know where i can find this file?

Got TypeError: unhashable type: 'list' when running eval_mrr.py

I got the error

Traceback (most recent call last):
File "src/scripts/retrieval/document/eval_mrr.py", line 22, in
evidence = set([t[1] for t in js["evidence"] if isinstance(t,list) and len(t)>1])
TypeError: unhashable type: 'list'

when running

python src/scripts/retrieval/document/eval_mrr.py --split dev --count 5

It seems that the t[1] can be a list, but isn't it supposed to be a string type?

pytorch 0.3.1 seems too outdated to install

Hi,
I faced an error when tried to build the Dockerfile:

command:

$ sudo docker build .

error:

Step 21/23 : RUN conda install -y pytorch=0.3.1 torchvision -c pytorch
...
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/pytorch/linux-64/pytorch-0.3.1-py36_cuda8.0.61_cudnn7.1.2_3.tar.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

Cheers,

Some training data files missing

When trying to train a model I get an error:

FileNotFoundError: [Errno 2] No such file or directory: 'data/fever/train.ns.pages.p1.jsonl'

I believe in the instructions some part is missing - maybe running the script src/scripts/dataset/download_dataset.py with appropriate parameters to get the needed files?

Thanks,
Slavko

Get error in the initialization regex

When I ran src/scripts/rte/da/eval_da.py, it shows that "Did not use initialization regex that was passed: .*token_embedder_tokens\._projection.*weight". And my allennlp version is 0.2.3. So how to solve this error?

Key Error when running drQA

When running the following command:

PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5

The new version of the code gives me a key error.

The exception comes from the line (from drqascripts/retriever/build_tfidf.py in function count)
col.extend([DOC2IDX[doc_id]] * len(counts))

I do not get this error if i set the flag --parallel to be false. Not too sure but I am guessing that the issue lies in the multiprocessing part of the code.

Thanks for your help!

Error Analysis

  • how often did DR return the right page?
  • how often did SR return the right page?
  • how often did SR return the original evidence?
  • for the times where SR returned different evidence. What are the differences between BLEU/ROUGE similarities between the claim and returned evidence vs claim and gold evidence?
  • Error coding scheme

installation fails with pip 10.0.1

Building wheels for collected packages: drqa, fever-scorer, drqa, fever-scorer
Running setup.py bdist_wheel for drqa ... done
Stored in directory: /tmp/pip-ephem-wheel-cache-cfhla23v/wheels/25/a8/71/6390f88d8b3ecda4c32998985670851ed7281bfa8ced27196e
Running setup.py bdist_wheel for fever-scorer ... done
Stored in directory: /tmp/pip-ephem-wheel-cache-cfhla23v/wheels/e0/c6/1a/8ff7f96802122bf337bfc8e05852f7d5618a6cffc95b5ee624
Running setup.py bdist_wheel for drqa ... error
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/drqa/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/pip-wheel-5hao0iio --python-tag cp36:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/drqa/setup.py'


Failed building wheel for drqa
Running setup.py clean for drqa
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/drqa/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" clean --all:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/drqa/setup.py'


Failed cleaning build dir for drqa
Running setup.py bdist_wheel for fever-scorer ... error
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/pip-wheel-3rmxr6dz --python-tag cp36:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py'


Failed building wheel for fever-scorer
Running setup.py clean for fever-scorer
Complete output from command /home/ubuntu/anaconda3/envs/fever/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" clean --all:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/envs/fever/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-vnfl6_v5/fever-scorer/setup.py'


Failed cleaning build dir for fever-scorer

ImportError: cannot import name 'Dataset'

When I trained the Decomposable Attention model, I got the bug:

File "src/scripts/rte/da/train_da.py", line 9, in <module>
    from allennlp.data import Vocabulary, Dataset, DataIterator, DatasetReader, Tokenizer, TokenIndexer
ImportError: cannot import name 'Dataset'

I installed allennlp from source and when I looked into the folder data, there's nothing called Dataset. Does anybody know how to fix this problem? Thanks!

Failed to build DrQA

Never mind, it looks like just permission issue.

My hunch is that the baseline uses a specifical version of drqa other than the one we can find in the drqa's repo. However, when I try to install it with the following command, as specified in the requirements:

pip3 install git+git://github.com/j6mes/drqa@fever#egg=DrQA-0.1.3

I got the error:

Building wheels for collected packages: drqa, drqa
  Running setup.py bdist_wheel for drqa ... done
  Stored in directory: /private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-ephem-wheel-cache-ok8ikj6r/wheels/2a/62/41/ddc1e0efc8a4f3becd45012e6624752e2fe5fbf733a5b61d3a
  Running setup.py bdist_wheel for drqa ... error
  Complete output from command /usr/local/opt/python3/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-wheel-t6ci4qrb --python-tag cp36:
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tokenize.py", line 452, in open
      buffer = _builtin_open(filename, 'rb')
  FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py'
  
  ----------------------------------------
  Failed building wheel for drqa
  Running setup.py clean for drqa
  Complete output from command /usr/local/opt/python3/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" clean --all:
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tokenize.py", line 452, in open
      buffer = _builtin_open(filename, 'rb')
  FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/gk/hj0gqfws7sj4s3dk8rnckplc0000gn/T/pip-install-6vmieha8/drqa/setup.py'
  
  ----------------------------------------
  Failed cleaning build dir for drqa
Successfully built drqa
Failed to build drqa

Can somebody help please? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.