Comments (6)
Hi Slavko,
The correct training data is provided. But because we don't have any training data for claims which are NotEnoughInfo
, we have to sample this. There are a few ways to do this. Including sampling sentences from the nearest page which is the file you're missing.
These are generated by running the instructions from Step 3 in the readme file:
PYTHONPATH=src python src/scripts/retrieval/document/batch_ir_ns.py --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --count 1 --split train
PYTHONPATH=src python src/scripts/retrieval/document/batch_ir_ns.py --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --count 1 --split dev
Let me know if you need more help.
J
from naacl2018-fever.
Thanks.
I ran the instructions in a script and I see I get Memory Error when running PYTHONPATH=src python src/scripts/build_tfidf.py data/fever/fever.db data/index/
. How much memory do I need for this - currently I am running on 32GB RAM?
Slavko
from naacl2018-fever.
Hmm. It runs fine on my Macbook with less memory. Perhaps you have more threads which each require their own memory.
You could try reducing the number of threads with the --num-workers
parameter
from naacl2018-fever.
Probably there is some environment problem. It works for me also on my Macbook, but fails on an Ubuntu 17.10.1 (Anaconda Python 3.6 64-bit; 32GB RAM), when running PYTHONPATH=src python src/scripts/build_tfidf.py data/fever/fever.db data/index/ --num-workers=1
:
04/15/2018 08:03:25 AM: [ Counting words... ]
04/15/2018 08:06:59 AM: [ Mapping... ]
04/15/2018 08:07:00 AM: [ -------------------------Batch 1/11------------------------- ]
04/15/2018 08:29:48 AM: [ -------------------------Batch 2/11------------------------- ]
04/15/2018 08:55:17 AM: [ -------------------------Batch 3/11------------------------- ]
04/15/2018 09:20:41 AM: [ -------------------------Batch 4/11------------------------- ]
04/15/2018 09:46:34 AM: [ -------------------------Batch 5/11------------------------- ]
04/15/2018 10:12:20 AM: [ -------------------------Batch 6/11------------------------- ]
04/15/2018 10:37:12 AM: [ -------------------------Batch 7/11------------------------- ]
04/15/2018 11:02:01 AM: [ -------------------------Batch 8/11------------------------- ]
04/15/2018 11:27:15 AM: [ -------------------------Batch 9/11------------------------- ]
04/15/2018 11:53:19 AM: [ -------------------------Batch 10/11------------------------- ]
04/15/2018 12:19:27 PM: [ -------------------------Batch 11/11------------------------- ]
04/15/2018 12:19:27 PM: [ Creating sparse matrix... ]
Traceback (most recent call last):
File "src/scripts/build_tfidf.py", line 35, in <module>
args, 'sqlite', {'db_path': args.db_path}
File "/home/slavkoz/anaconda3/envs/fever/lib/python3.6/site-packages/drqascripts/retriever/build_tfidf.py", line 123, in get_count_matrix
(data, (row, col)), shape=(args.hash_size, len(doc_ids))
File "/home/slavkoz/anaconda3/envs/fever/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 51, in __init__
other = self.__class__(coo_matrix(arg1, shape=shape))
File "/home/slavkoz/anaconda3/envs/fever/lib/python3.6/site-packages/scipy/sparse/coo.py", line 158, in __init__
self.data = np.array(obj, copy=copy)
MemoryError
-> memory usage gets high and then the process is killed:
top - 12:29:02 up 1 day, 22:14, 0 users, load average: 1.56, 2.08, 1.65
Tasks: 189 total, 2 running, 187 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.0 us, 0.1 sy, 0.0 ni, 89.8 id, 3.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32894132 total, 2130528 free, 30669144 used, 94460 buff/cache
KiB Swap: 2097148 total, 1168516 free, 928632 used. 1901324 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24697 slavkoz 20 0 31.469g 0.028t 2836 R 100.0 92.1 13:32.46 python
1 root 20 0 220324 2044 556 S 0.0 0.0 0:03.82 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
Will check what may be the problem and report ...
from naacl2018-fever.
from naacl2018-fever.
For reference: facebookresearch/DrQA#30
from naacl2018-fever.
Related Issues (20)
- MLP model training crashes if models directory doesnt exist
- problems with model 2 from readme HOT 2
- Cannot find "OnlineTfidfDocRanker" HOT 5
- MLP training will crash if models directory doesn't exist
- ImportError: cannot import name 'Dataset' HOT 3
- Failed to build DrQA
- NameError: name 'get_count_matrix' is not defined HOT 2
- Key Error when running drQA HOT 1
- installation fails with pip 10.0.1 HOT 2
- Got TypeError: unhashable type: 'list' when running eval_mrr.py
- Evidence Retrieval Evaluation being killed
- Evaluation speed HOT 1
- Get error in the initialization regex HOT 1
- pytorch 0.3.1 seems too outdated to install HOT 1
- fever competition
- Pre-trained model.tar.gz Not Available HOT 2
- Error in Data Preparation HOT 2
- Can't download dataset HOT 2
- Can't download pretrained model HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from naacl2018-fever.