matsui528 / nanopq Goto Github PK
View Code? Open in Web Editor NEWPure python implementation of product quantization for nearest neighbor search
License: MIT License
Pure python implementation of product quantization for nearest neighbor search
License: MIT License
I wanted to learn how product quantization works, and this repository provided excellent code to understand how it works. As I had been learning Rust for a few months now, I decided to re-write the pq.py
script in Rust to understand each step thoroughly by self-implementation. Here's the repository containing the Rust code: shubham0204/pq.rs
.
The following steps are have to be taken in order to complete the project:
README.md
and add a small usage sample of the Rust APIcrates.io
Do let me know if the repository can be included as a community resource. Just like me, many other learners would like to learn implementation of product quantization in languages other than Python, and building a section where implementations in other languages would be of great help. Moreover, I'm also working on a detailed blog which will explain product-quantization from first-concepts and with a Rust implementation.
thanks for your work.
`import nanopq
import numpy as np
N, Nt, D = 10000, 2000, 128
X = np.random.random((N, D)).astype(np.float32) # 10,000 128-dim vectors to be indexed
Xt = np.random.random((Nt, D)).astype(np.float32) # 2,000 128-dim vectors for training
query = np.random.random((D,)).astype(np.float32) # a 128-dim query vector
pq = nanopq.PQ(M=8, Ks=256)
pq.fit(Xt, seed=123)
X_code = pq.encode(X) # (10000, 8) with dtype=np.uint8
X_reconstructed = pq.decode(codes=X_code)
tmp = X[0]
tmp1 = X_reconstructed[0]
dis = np.sqrt(np.sum(np.square(tmp - tmp1)))`
the dis is about 2.0+ . dose it look like right?
How to compute codes similar to FAISS using NanoPQ without using FAISS
I am looking in to do centroid of centroids using NanoPQ, is it possible?. I have a first level nanopq model M=4, K=16, D=24. The codewords that is produced is (4, 16, 6), can this output be sent as an input for the second level nanoPQ to calculate centroid of centroids? The reason for investigating centroid of centroids is due to processing large datasets and reduce processing time.
Not sure if this should be a feature request.
Supposed I just want to approximate distance between two PQ codes (under the same encoder of course). What is the most efficient way to perform such operation?
hi,friend ,I have two question .
`def test_parametric_init(self):
N, D, M, Ks = 100, 12, 4, 10
X = np.random.random((N, D)).astype(np.float32)
opq = nanopq.OPQ(M=M, Ks=Ks)
opq.fit(X, parametric_init=False, rotation_iter=1)
err_init = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))
opq = nanopq.OPQ(M=M, Ks=Ks)
opq.fit(X, parametric_init=True, rotation_iter=1)
err = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))
self.assertLess(err_init, err)`
self.pq.decode(codes) @ self.R.T
I think this should be 8 bits not 256! otherwise the package is very helpful thanks!
Line 22 in 5c9e138
Replace travis with github actions
The Optimized PQ class takes a verbose
flag, and this is passed to the inner PQ class.
However, it would make sense if OPQ also obeyed this flag in its fit
method, which outputs lots of things about "Reconstruction error".
Hi!
Thanks for writing this package, it looks great!
I'd be interested in turning the print statements (with verbose=True) into logging statements. The verbose flag could then be used to control whether this logging is output to stdout (i.e., by setting the log level). Is this something you are interested in? if so, I could submit a PR.
How can we determine the ideal value for the Ks
parameter?
When using scipy == 1.3.3 , the error occurs.
when i use lower scipy into 1.2.1 version, it runs well
Hi! I like your code a lot! But there is one question: Why is that when I change the parameters in the test_parametric_init
to N, D, M, Ks = 100, 12, 4, 20
, the test will fail?
Do you have any idea? I thought the rotation matrix would work with different M
s.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.