Giter VIP home page Giter VIP logo

Comments (14)

SAIVENKATARAJU avatar SAIVENKATARAJU commented on May 20, 2024

Hi,
I understand the issue, that it was stage argument I need to pass for all. However, with following command the process is taking forever, no progress, not stopping.
Screenshot from 2021-10-15 17-06-30

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Hi @SAIVENKATARAJU, what is the version of faiss? Did you install faiss-gpu?
For the small version of phrase indexes, this shouldn't take more than several minutes.

from densephrases.

SAIVENKATARAJU avatar SAIVENKATARAJU commented on May 20, 2024

Hi @jhyuklee
Thanks for your comment . I am using faiss-gpu -1.65 version.

from densephrases.

SAIVENKATARAJU avatar SAIVENKATARAJU commented on May 20, 2024

Hi @jhyuklee
I just debug the code further, and i got this error, Not sure what i am missing here. For debug purpose i have change the script and added arguments inside the code itself like this. Hope this will helpful. I am attaching my article.json here.

args.dump_dir='./outputs/densephrases-multi_sample/dump'
args.stage='all'
args.replace=True
args.num_clusters=32
args.fine_quant='OPQ96'
args.doc_sample_ratio=1.0
args.vec_sample_ratio=1.0
args.cuda=True
args.index_filter=-1e8
args.index_name='start'
args.quantizer_path='quantizer.faiss'
args.trained_index_path='trained.faiss'
args.inv_path='merged.invdata'
args.subindex_name='index'
args.dump_paths='./densephrases-multi_sample/dump/phrase'
args.phrase_dir='phrase'
args.subindex_dir='./densephrases-multi_sample/dump/phrase/'
args.offset=0
args.norm_th=999

Screenshot from 2021-10-16 15-56-52

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

I'm not sure why, but the error message says you are trying to use hnsw. The default setting for using hnsw is false and this shouldn't occur. Could you check if the command passes these lines?

opq_matrix = faiss.OPQMatrix(ds, code_size)
opq_matrix.niter = 10
sub_index = faiss.IndexIVFPQ(quantizer, ds, num_clusters, code_size, 8, faiss.METRIC_INNER_PRODUCT)
start_index = faiss.IndexPreTransform(opq_matrix, sub_index)

from densephrases.

SAIVENKATARAJU avatar SAIVENKATARAJU commented on May 20, 2024

Hi @jhyuklee

This hnsw type error was resolved, however ,I posted second screenshot that its hanging forever, That problem still persisting. its happening because of deadlock. Please find the screenshot below. I am attaching sample article.json and commands That I have used. if you have time you can go through that.
Command 1:
python DensePhrases/generate_phrase_vecs.py --model_type bert --pretrained_name_or_path SpanBERT/spanbert-base-cased --data_dir ./ --cache_dir $CACHE_DIR --predict_file article_original.json --do_dump --max_seq_length 512 --doc_stride 500 --fp16 --filter_threshold -2.0 --append_title --load_dir $SAVE_DIR/densephrases-multi --output_dir $SAVE_DIR/densephrases-multi_sample

Command 2
python DensePhrases/build_phrase_index.py --dump_dir $SAVE_DIR/densephrases-multi_sample/dump --stage all --replace --num_clusters 32 --fine_quant OPQ96 --doc_sample_ratio 1.0 --vec_sample_ratio 1.0 --cuda

deadlock
dead_lock

article_original.zip

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Hi @SAIVENKATARAJU, I've downloaded your article_original.json and found out that the size of it is too small to train the index. I got the following error as follows:

RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/Clustering.cpp:275: Error: 'nx >= k' failed: Number of training points (29) should be at least as large as number of clusters (256)

If you really want to build a phrase index for this small sized corpus, try exact search instead of IVFPQ.
This could be done by setting --fine_quant none for build_phrase_index. But this will also require you to fix the relevant parts in the https://github.com/princeton-nlp/DensePhrases/blob/main/densephrases/index.py because you will be using faiss.IndexFlatIP not faiss.IndexIVFPQ. So I suggest increasing the size of the corpus.

I didn't encounter any problem with article.json (default file). By the way, thanks for correcting the command for build_phrase_index.py.

from densephrases.

SAIVENKATARAJU avatar SAIVENKATARAJU commented on May 20, 2024

Hi @jhyuklee

Thanks for Your comments.

I've downloaded your article_original.json and found out that the size of it is too small to train the index. I got the following error as follows:

I wonder the default file and article_original.json are both are same. I just rename it as my original custom data is article.json. How come you did not see any error for default file but with a change name.

And So you did get any deadlock while building Phrase index as shown above?

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

No I didn't. Seems like article_original.json is much smaller than articles.json. If you want to see a detailed error message, you can set start_index.verbose=True in

start_index.verbose = False
and gpu_index.verbose=True in
gpu_index.verbose = False
Make sure you are using GPUs to create the index.

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Oh it seems like you just copied and created the json file..
You should use this https://github.com/princeton-nlp/DensePhrases/blob/main/examples/create-custom-index/articles.json.

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Hi @SAIVENKATARAJU, is your problem solved?

from densephrases.

SAIVENKATARAJU avatar SAIVENKATARAJU commented on May 20, 2024

Hi ,
No Unfortunately, I am stuck with deadlock mentioned above. I am getting deadlock even for your articles.json. I saw your post on haystack. is there any near planning to integrate this to haystack?

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Yes, but that will take some time. I think there could be a problem with the faiss installation. You can try re-installing faiss and pytorch. Their dependencies sometimes conflict.

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Keep an eye on this issue if you need: deepset-ai/haystack#1721

from densephrases.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.