Comments (14)
Hi,
I understand the issue, that it was stage argument I need to pass for all. However, with following command the process is taking forever, no progress, not stopping.
from densephrases.
Hi @SAIVENKATARAJU, what is the version of faiss? Did you install faiss-gpu?
For the small version of phrase indexes, this shouldn't take more than several minutes.
from densephrases.
Hi @jhyuklee
Thanks for your comment . I am using faiss-gpu -1.65 version.
from densephrases.
Hi @jhyuklee
I just debug the code further, and i got this error, Not sure what i am missing here. For debug purpose i have change the script and added arguments inside the code itself like this. Hope this will helpful. I am attaching my article.json here.
args.dump_dir='./outputs/densephrases-multi_sample/dump'
args.stage='all'
args.replace=True
args.num_clusters=32
args.fine_quant='OPQ96'
args.doc_sample_ratio=1.0
args.vec_sample_ratio=1.0
args.cuda=True
args.index_filter=-1e8
args.index_name='start'
args.quantizer_path='quantizer.faiss'
args.trained_index_path='trained.faiss'
args.inv_path='merged.invdata'
args.subindex_name='index'
args.dump_paths='./densephrases-multi_sample/dump/phrase'
args.phrase_dir='phrase'
args.subindex_dir='./densephrases-multi_sample/dump/phrase/'
args.offset=0
args.norm_th=999
from densephrases.
I'm not sure why, but the error message says you are trying to use hnsw. The default setting for using hnsw is false
and this shouldn't occur. Could you check if the command passes these lines?
DensePhrases/build_phrase_index.py
Lines 113 to 116 in a8819d9
from densephrases.
Hi @jhyuklee
This hnsw type error was resolved, however ,I posted second screenshot that its hanging forever, That problem still persisting. its happening because of deadlock. Please find the screenshot below. I am attaching sample article.json and commands That I have used. if you have time you can go through that.
Command 1:
python DensePhrases/generate_phrase_vecs.py --model_type bert --pretrained_name_or_path SpanBERT/spanbert-base-cased --data_dir ./ --cache_dir $CACHE_DIR --predict_file article_original.json --do_dump --max_seq_length 512 --doc_stride 500 --fp16 --filter_threshold -2.0 --append_title --load_dir $SAVE_DIR/densephrases-multi --output_dir $SAVE_DIR/densephrases-multi_sample
Command 2
python DensePhrases/build_phrase_index.py --dump_dir $SAVE_DIR/densephrases-multi_sample/dump --stage all --replace --num_clusters 32 --fine_quant OPQ96 --doc_sample_ratio 1.0 --vec_sample_ratio 1.0 --cuda
from densephrases.
Hi @SAIVENKATARAJU, I've downloaded your article_original.json
and found out that the size of it is too small to train the index. I got the following error as follows:
RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/Clustering.cpp:275: Error: 'nx >= k' failed: Number of training points (29) should be at least as large as number of clusters (256)
If you really want to build a phrase index for this small sized corpus, try exact search instead of IVFPQ.
This could be done by setting --fine_quant none
for build_phrase_index. But this will also require you to fix the relevant parts in the https://github.com/princeton-nlp/DensePhrases/blob/main/densephrases/index.py
because you will be using faiss.IndexFlatIP
not faiss.IndexIVFPQ.
So I suggest increasing the size of the corpus.
I didn't encounter any problem with article.json
(default file). By the way, thanks for correcting the command for build_phrase_index.py.
from densephrases.
Hi @jhyuklee
Thanks for Your comments.
I've downloaded your article_original.json and found out that the size of it is too small to train the index. I got the following error as follows:
I wonder the default file and article_original.json are both are same. I just rename it as my original custom data is article.json. How come you did not see any error for default file but with a change name.
And So you did get any deadlock while building Phrase index as shown above?
from densephrases.
No I didn't. Seems like article_original.json
is much smaller than articles.json
. If you want to see a detailed error message, you can set start_index.verbose=True
in
DensePhrases/build_phrase_index.py
Line 122 in a983eeb
gpu_index.verbose=True
in DensePhrases/build_phrase_index.py
Line 129 in a983eeb
from densephrases.
Oh it seems like you just copied and created the json file..
You should use this https://github.com/princeton-nlp/DensePhrases/blob/main/examples/create-custom-index/articles.json.
from densephrases.
Hi @SAIVENKATARAJU, is your problem solved?
from densephrases.
Hi ,
No Unfortunately, I am stuck with deadlock mentioned above. I am getting deadlock even for your articles.json. I saw your post on haystack. is there any near planning to integrate this to haystack?
from densephrases.
Yes, but that will take some time. I think there could be a problem with the faiss installation. You can try re-installing faiss and pytorch. Their dependencies sometimes conflict.
from densephrases.
Keep an eye on this issue if you need: deepset-ai/haystack#1721
from densephrases.
Related Issues (20)
- How to extract phrases from Wikipedia? HOT 5
- Representations of phrases HOT 6
- Train custom teacher model HOT 3
- Question about faiss parameter HOT 4
- Modifying num_clusters in index-vecs HOT 11
- Unable to Reproduce Passage Retrieval Results on NQ HOT 9
- Reproduction of DensePhrase (w/ PQ, w/o qft) on SQuAD HOT 9
- Significance of line 174 in train_query.py code HOT 4
- Iterative retrieval in case of non-unique top-k retrieval HOT 2
- failed with "make draft MODEL_NAME=test" HOT 2
- Where is the code for queries to get phrases searching score rank? HOT 2
- how to evaluate model on SQuAD (non openQA settings) HOT 1
- How to choose phrase to encode in wikipedia document
- DensePhrases for non-answerable questions
- run_demo.py : IndexError: index out of range in self HOT 1
- editing the demo file HOT 3
- IndexError: index 99 is out of bounds for axis 0 with size 35
- Recipe to build dense representations from corpus HOT 1
- Implementation of contrastive loss with in-passage negative
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from densephrases.