When I am running python s/retriever/interactive.py command the

Try running with overcommit enabled ( echo 1 > /proc

Numpy memory error about drqa HOT 19 CLOSED

facebookresearch commented on August 18, 2024

Numpy memory error

from drqa.

Comments (19)

ajfisch commented on August 18, 2024 2

From your free, it looks like you do not have enough RAM on your machine. You need at least around 15 GB and it looks like you have 8 (if the units you posted are MB).

from drqa.

ajfisch commented on August 18, 2024 1

Do you still have overcommit enabled? You might need that to run with the tokenizers, as it allocates (but doesn't use all) memory for the JVM for each tokenizer process.

You can also see if running with --tokenizer spacy works.
Edit: Try --tokenizer regexp first, as you'd need to pip install spacy && python -m spacy download en for the former

from drqa.

ajfisch commented on August 18, 2024 1

Try running with overcommit enabled (echo 1 > /proc/sys/vm/overcommit_memory)
If that still errors, try running python scripts/pipeline/interactive.py --tokenizer regexp, it uses a less resource intensive tokenizer (where your machine is failing).

from drqa.

Deepakchawla commented on August 18, 2024 1

Okay so I will try with GPU and try to reduce its execution time... and thanks a lot once again... you help a lot and also contribute to accomplishment my passionate project... 😄

from drqa.

ajfisch commented on August 18, 2024 1

You are very welcome!

from drqa.

ajfisch commented on August 18, 2024

How much free RAM does your system have? Is it possible your download was interrupted and got corrupted?

from drqa.

Deepakchawla commented on August 18, 2024

below is free command results:
total used free shared buff/cache available
Mem: 7484 92 7176 9 215 7158
Swap: 0 0 0

from drqa.

Deepakchawla commented on August 18, 2024

I set the value of cat /proc/sys/vm/overcommit_memory to 1 using echo 1 > /proc/sys/vm/overcommit_memory and again run interactive.py file and it shows me below message...
deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py
08/21/2017 05:49:49 PM: [ Running on CPU only. ]
08/21/2017 05:49:49 PM: [ Initializing pipeline... ]
08/21/2017 05:49:49 PM: [ Initializing document ranker... ]
08/21/2017 05:49:49 PM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Killed
now what should I do...??

from drqa.

Deepakchawla commented on August 18, 2024

Ok I will change it from 8gb to 15gb but when I changed its value from 0 to 1 then it doesn't show me any memory relates error and run smoothly but it shows some killed like message now what the reason behind that killed message..

from drqa.

ajfisch commented on August 18, 2024

Setting the value from 0 to 1 enabled overcommit, always. In overcommit mode the linux kernel always lets a memory allocation like malloc return true. But then when your program actually uses that memory, you will run out of space, and the kernel OOM Killer will kill the process (hence your Killed message).

On the other hand, If overcommit is not enabled, then the kernel will not let programs allocate more virtual memory than is physically available. malloc will return false and the actual program (in this case numpy) will exit with an error (MemoryError).

from drqa.

Deepakchawla commented on August 18, 2024

okay got your point but now I changed by RAM size and free -m before running Python file
total used free shared buff/cache available
Mem: 22099 148 21876 10 74 21708
Swap: 0 0 0
deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py
08/22/2017 03:17:25 AM: [ Running on CPU only. ]
08/22/2017 03:17:25 AM: [ Initializing pipeline... ]
08/22/2017 03:17:25 AM: [ Initializing document ranker... ]
08/22/2017 03:17:25 AM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
08/22/2017 03:19:24 AM: [ Initializing document reader... ]
08/22/2017 03:19:24 AM: [ Loading model /home/deepakchawla35/DrQA/data/reader/multitask.mdl ]
08/22/2017 03:19:31 AM: [ Initializing tokenizers and document retrievers... ]
Traceback (most recent call last):
File "scripts/pipeline/interactive.py", line 70, in
tokenizer=args.tokenizer
File "/home/deepakchawla35/DrQA/drqa/pipeline/drqa.py", line 140, in init
initargs=(tok_class, tok_opts, db_class, db_opts, fixed_candidates)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 168, in init
self._repopulate_pool()
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 20, in init
self._launch(process_obj)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
and running python file it shows something else RAM size...
free -m
total used free shared buff/cache available
Mem: 22099 148 13961 10 7989 21628
Swap: 0 0 0

from drqa.

Deepakchawla commented on August 18, 2024

no currently overcommit disabled
deepakchawla35@deepak-server:~/DrQA$ cat /proc/sys/vm/overcommit_memory
0
You can also see if running with --tokenizer spacy works. => don't get your point...

from drqa.

Deepakchawla commented on August 18, 2024

okay, let me try...

from drqa.

Deepakchawla commented on August 18, 2024

now it working perfectly... thank you so much but it giving me the wrong prediction for some questions:-
**>>> process('when facebook company ipo launched')
08/22/2017 03:49:42 AM: [ Processing 1 queries... ]
08/22/2017 03:49:42 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:49:43 AM: [ Reading 323 paragraphs... ]
08/22/2017 03:49:51 AM: [ Processed 1 queries in 8.7226 (s) ]
Top Predictions:
+------+--------+-------------------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------+-------------------------------------+--------------+-----------+
| 1 | 2009 | Initial public offering of Facebook | 49060 | 248.07 |
+------+--------+-------------------------------------+--------------+-----------+

Contexts:
[ Doc = Initial public offering of Facebook ]
To ensure that early investors would retain control of the company, Facebook in 2009 instituted a dual-class stock structure. After the IPO, Zuckerberg was to retain a 22% ownership share in Facebook and was to own 57% of the voting shares. The document also stated that the company was seeking to raise 5 billion, which would make it one of the largest IPOs in tech history and the biggest in Internet history.**

**>>> process('when facebook company IPO launched')
08/22/2017 03:51:07 AM: [ Processing 1 queries... ]
08/22/2017 03:51:07 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:51:07 AM: [ Reading 323 paragraphs... ]
08/22/2017 03:51:14 AM: [ Processed 1 queries in 6.7024 (s) ]
Top Predictions:
+------+--------+-------------------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------+-------------------------------------+--------------+-----------+
| 1 | 2012 | Initial public offering of Facebook | 4.8931e+05 | 248.07 |
+------+--------+-------------------------------------+--------------+-----------+

Contexts:
[ Doc = Initial public offering of Facebook ]
The social networking company Facebook held its initial public offering (IPO) on Friday, May 18, 2012. The IPO was the biggest in technology and one of the biggest in Internet history, with a peak market capitalization of over $104 billion. Media pundits called it a "cultural touchstone."**

**>>> process('who is father of deep learning')
08/22/2017 03:52:47 AM: [ Processing 1 queries... ]
08/22/2017 03:52:47 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:52:48 AM: [ Reading 479 paragraphs... ]
08/22/2017 03:52:55 AM: [ Processed 1 queries in 7.3674 (s) ]
Top Predictions:
+------+---------------------+---------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+---------------------+---------------+--------------+-----------+
| 1 | Juergen Schmidhuber | Deep learning | 3.7192e+08 | 453.99 |
+------+---------------------+---------------+--------------+-----------+

Contexts:
[ Doc = Deep learning ]
Deep learning algorithms transform their inputs through more layers than shallow learning algorithms. At each layer, the signal is transformed by a processing unit, like an artificial neuron, whose parameters are 'learned' through training. A chain of transformations from input to output is a "credit assignment path" (CAP). CAPs describe potentially causal connections between input and output and may vary in length – for a feedforward neural network, the depth of the CAPs (thus of the network) is the number of hidden layers plus one (as the output layer is also parameterized), but for recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP is potentially unlimited in length. There is no universally agreed upon threshold of depth dividing shallow learning from deep learning, but most researchers in the field agree that deep learning has multiple nonlinear layers (CAP > 2) and Juergen Schmidhuber considers CAP > 10 to be very deep learning.**

from drqa.

ajfisch commented on August 18, 2024

I am glad that it is working.

DrQA is just an AI research project -- of course there is no guarantee that it will answer all questions correctly (or in the case of this model be invariant to spelling, capitalization, or phrasing). In fact from our reported evaluations on several QA datasets, you can expect that DrQA will get most questions wrong (but also a fair amount correct). Hopefully this model can be a baseline for machine reading at scale that someone like you can beat 😉.

Then again, the answers to some of these questions are subjective. Perhaps Juergen wouldn't mind the answer to your question 3...

from drqa.

Deepakchawla commented on August 18, 2024

okay and are you improving or working on its QA datasets to give more accurate answers.... and one more thing currently it taking so much time on giving the answers I want to do it in max. 3 sec.. what should I have to do to achieve this...??

from drqa.

ajfisch commented on August 18, 2024

Reading comprehension and open-domain QA is an active area of research, for FAIR and others.

To improve the runtime performance of DrQA you will need a machine with better specs. It also scales better with large batches (faster average time per question).

Ideally you will have a machine with a GPU and CUDNN. The higher quality the GPU, the better.
Having more CPU cores (especially if you are lacking a GPU) is also very helpful. The prediction pipeline runs on both CPU and GPU. >15 cores is good, more if not using a GPU.
Running in large batch sizes (say up to 1000 questions) is quite more efficient than single question. You can see how batching is done in scripts/pipeline/predict.py for example.
As an immediate measure, you can reduce the number of documents DrQA reads per question (the n_docs parameter in process, default is 5). This will hurt your accuracy, however.

from drqa.

Deepakchawla commented on August 18, 2024

😊

from drqa.

augmen commented on August 18, 2024

Hi i am having the same issue with 8GB RAM and 4CPU cores. Can you help us .
(pt) root@ml:~/DrQA# python3 scripts/pipeline/interactive.py --tokenizer regexp Traceback (most recent call last): File "scripts/pipeline/interactive.py", line 16, in <module> from drqa import pipeline ImportError: No module named 'drqa'

from drqa.

Numpy memory error about drqa HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent