Comments (19)
From your free
, it looks like you do not have enough RAM on your machine. You need at least around 15 GB and it looks like you have 8 (if the units you posted are MB).
from drqa.
Do you still have overcommit enabled? You might need that to run with the tokenizers, as it allocates (but doesn't use all) memory for the JVM for each tokenizer process.
You can also see if running with --tokenizer spacy
works.
Edit: Try --tokenizer regexp
first, as you'd need to pip install spacy && python -m spacy download en
for the former
from drqa.
- Try running with overcommit enabled (
echo 1 > /proc/sys/vm/overcommit_memory
) - If that still errors, try running
python scripts/pipeline/interactive.py --tokenizer regexp
, it uses a less resource intensive tokenizer (where your machine is failing).
from drqa.
Okay so I will try with GPU and try to reduce its execution time... and thanks a lot once again... you help a lot and also contribute to accomplishment my passionate project... 😄
from drqa.
You are very welcome!
from drqa.
How much free RAM does your system have? Is it possible your download was interrupted and got corrupted?
from drqa.
below is free command results:
total used free shared buff/cache available
Mem: 7484 92 7176 9 215 7158
Swap: 0 0 0
from drqa.
I set the value of cat /proc/sys/vm/overcommit_memory to 1 using echo 1 > /proc/sys/vm/overcommit_memory and again run interactive.py file and it shows me below message...
deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py
08/21/2017 05:49:49 PM: [ Running on CPU only. ]
08/21/2017 05:49:49 PM: [ Initializing pipeline... ]
08/21/2017 05:49:49 PM: [ Initializing document ranker... ]
08/21/2017 05:49:49 PM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Killed
now what should I do...??
from drqa.
Ok I will change it from 8gb to 15gb but when I changed its value from 0 to 1 then it doesn't show me any memory relates error and run smoothly but it shows some killed like message now what the reason behind that killed message..
from drqa.
Setting the value from 0 to 1 enabled overcommit, always. In overcommit mode the linux kernel always lets a memory allocation like malloc
return true. But then when your program actually uses that memory, you will run out of space, and the kernel OOM Killer will kill the process (hence your Killed message).
On the other hand, If overcommit is not enabled, then the kernel will not let programs allocate more virtual memory than is physically available. malloc
will return false and the actual program (in this case numpy) will exit with an error (MemoryError).
from drqa.
okay got your point but now I changed by RAM size and free -m before running Python file
total used free shared buff/cache available
Mem: 22099 148 21876 10 74 21708
Swap: 0 0 0
deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py
08/22/2017 03:17:25 AM: [ Running on CPU only. ]
08/22/2017 03:17:25 AM: [ Initializing pipeline... ]
08/22/2017 03:17:25 AM: [ Initializing document ranker... ]
08/22/2017 03:17:25 AM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
08/22/2017 03:19:24 AM: [ Initializing document reader... ]
08/22/2017 03:19:24 AM: [ Loading model /home/deepakchawla35/DrQA/data/reader/multitask.mdl ]
08/22/2017 03:19:31 AM: [ Initializing tokenizers and document retrievers... ]
Traceback (most recent call last):
File "scripts/pipeline/interactive.py", line 70, in
tokenizer=args.tokenizer
File "/home/deepakchawla35/DrQA/drqa/pipeline/drqa.py", line 140, in init
initargs=(tok_class, tok_opts, db_class, db_opts, fixed_candidates)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 168, in init
self._repopulate_pool()
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 20, in init
self._launch(process_obj)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
and running python file it shows something else RAM size...
free -m
total used free shared buff/cache available
Mem: 22099 148 13961 10 7989 21628
Swap: 0 0 0
from drqa.
no currently overcommit disabled
deepakchawla35@deepak-server:~/DrQA$ cat /proc/sys/vm/overcommit_memory
0
You can also see if running with --tokenizer spacy works. => don't get your point...
from drqa.
okay, let me try...
from drqa.
now it working perfectly... thank you so much but it giving me the wrong prediction for some questions:-
**>>> process('when facebook company ipo launched')
08/22/2017 03:49:42 AM: [ Processing 1 queries... ]
08/22/2017 03:49:42 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:49:43 AM: [ Reading 323 paragraphs... ]
08/22/2017 03:49:51 AM: [ Processed 1 queries in 8.7226 (s) ]
Top Predictions:
+------+--------+-------------------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------+-------------------------------------+--------------+-----------+
| 1 | 2009 | Initial public offering of Facebook | 49060 | 248.07 |
+------+--------+-------------------------------------+--------------+-----------+
Contexts:
[ Doc = Initial public offering of Facebook ]
To ensure that early investors would retain control of the company, Facebook in 2009 instituted a dual-class stock structure. After the IPO, Zuckerberg was to retain a 22% ownership share in Facebook and was to own 57% of the voting shares. The document also stated that the company was seeking to raise 5 billion, which would make it one of the largest IPOs in tech history and the biggest in Internet history.**
**>>> process('when facebook company IPO launched')
08/22/2017 03:51:07 AM: [ Processing 1 queries... ]
08/22/2017 03:51:07 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:51:07 AM: [ Reading 323 paragraphs... ]
08/22/2017 03:51:14 AM: [ Processed 1 queries in 6.7024 (s) ]
Top Predictions:
+------+--------+-------------------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------+-------------------------------------+--------------+-----------+
| 1 | 2012 | Initial public offering of Facebook | 4.8931e+05 | 248.07 |
+------+--------+-------------------------------------+--------------+-----------+
Contexts:
[ Doc = Initial public offering of Facebook ]
The social networking company Facebook held its initial public offering (IPO) on Friday, May 18, 2012. The IPO was the biggest in technology and one of the biggest in Internet history, with a peak market capitalization of over $104 billion. Media pundits called it a "cultural touchstone."**
**>>> process('who is father of deep learning')
08/22/2017 03:52:47 AM: [ Processing 1 queries... ]
08/22/2017 03:52:47 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:52:48 AM: [ Reading 479 paragraphs... ]
08/22/2017 03:52:55 AM: [ Processed 1 queries in 7.3674 (s) ]
Top Predictions:
+------+---------------------+---------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+---------------------+---------------+--------------+-----------+
| 1 | Juergen Schmidhuber | Deep learning | 3.7192e+08 | 453.99 |
+------+---------------------+---------------+--------------+-----------+
Contexts:
[ Doc = Deep learning ]
Deep learning algorithms transform their inputs through more layers than shallow learning algorithms. At each layer, the signal is transformed by a processing unit, like an artificial neuron, whose parameters are 'learned' through training. A chain of transformations from input to output is a "credit assignment path" (CAP). CAPs describe potentially causal connections between input and output and may vary in length – for a feedforward neural network, the depth of the CAPs (thus of the network) is the number of hidden layers plus one (as the output layer is also parameterized), but for recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP is potentially unlimited in length. There is no universally agreed upon threshold of depth dividing shallow learning from deep learning, but most researchers in the field agree that deep learning has multiple nonlinear layers (CAP > 2) and Juergen Schmidhuber considers CAP > 10 to be very deep learning.**
from drqa.
I am glad that it is working.
DrQA is just an AI research project -- of course there is no guarantee that it will answer all questions correctly (or in the case of this model be invariant to spelling, capitalization, or phrasing). In fact from our reported evaluations on several QA datasets, you can expect that DrQA will get most questions wrong (but also a fair amount correct). Hopefully this model can be a baseline for machine reading at scale that someone like you can beat 😉.
Then again, the answers to some of these questions are subjective. Perhaps Juergen wouldn't mind the answer to your question 3...
from drqa.
okay and are you improving or working on its QA datasets to give more accurate answers.... and one more thing currently it taking so much time on giving the answers I want to do it in max. 3 sec.. what should I have to do to achieve this...??
from drqa.
Reading comprehension and open-domain QA is an active area of research, for FAIR and others.
To improve the runtime performance of DrQA you will need a machine with better specs. It also scales better with large batches (faster average time per question).
- Ideally you will have a machine with a GPU and CUDNN. The higher quality the GPU, the better.
- Having more CPU cores (especially if you are lacking a GPU) is also very helpful. The prediction pipeline runs on both CPU and GPU. >15 cores is good, more if not using a GPU.
- Running in large batch sizes (say up to 1000 questions) is quite more efficient than single question. You can see how batching is done in
scripts/pipeline/predict.py
for example. - As an immediate measure, you can reduce the number of documents DrQA reads per question (the
n_docs
parameter inprocess
, default is 5). This will hurt your accuracy, however.
from drqa.
😊
from drqa.
Hi i am having the same issue with 8GB RAM and 4CPU cores. Can you help us .
(pt) root@ml:~/DrQA# python3 scripts/pipeline/interactive.py --tokenizer regexp Traceback (most recent call last): File "scripts/pipeline/interactive.py", line 16, in <module> from drqa import pipeline ImportError: No module named 'drqa'
from drqa.
Related Issues (20)
- Questions about files generation HOT 5
- the model file docs.db.gz was broken
- SyntaxError HOT 5
- Please mention the spacy version required
- How much RAM is needed to run the interactive script?
- Bad file descriptor
- UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2
- Reading 0 paragraphs HOT 1
- Test code cannot be executed due to timeout error
- Runtime error while reading model from directory HOT 1
- Doesn't know the answer HOT 8
- "Messenger" (software)
- > https://wa.me/message/MCRHATMQSTO6A1
- ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
- error of running
- RuntimeError: CUDA error: no kernel image is available for execution on the device
- The file E:/Deep_learning/DrQA-main\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz can't open
- ./install_corenlp.sh: line 17: wget: command not found
- python scripts/pipeline/interactive.py keep on running for so long? HOT 1
- Error while running java command
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drqa.