Comments (12)
Reading in #409,
Im also trying to dump the pg_data/pg_vector indexes manually on disk, and REINDEX database;
which seemed to work.
For 15m vectors prior to indexing, Explain analyze on a faceted search took roughly 23 seconds.
Post index, it took 6 seconds.
from pgvecto.rs.
Im not going to close, only because there might be some interesting data in here to debug why the first index pass fails.
from pgvecto.rs.
Interesting. It seems like reindexing actually crashes but it happens in the background.
from pgvecto.rs.
What's your hardware? How much memory do you have?
from pgvecto.rs.
There might be some part already crashed. Can you try it with a fresh new database? Or manually delete all files under pgdata/pgvecto_rs
and run REINDEX
?
from pgvecto.rs.
Hi There!
Im running Docker on an M2 Mac Pro with 32 Gb Ram. Docker has 5 CPUs and 20 GB allocated.
I was able to run tensorchord/pgvecto-rs:pg16-v0.3.0-alpha.1
and while it seems to build the index from the get go, Its not obvious to me if the index is being used in our queries using EXPLAIN ANALYZE.
Manually deleting that folder and running REINDEX
did work for 0.2.1, but it seems unnecessary for 0.3.0
Question: After running create index, and it returns / completed, should I expect higher than idle CPU usage on the Postgres container? I note that it seems like Indexing is still running.
Thanks for any insight @VoVAllen
from pgvecto.rs.
The behavior is changed between 0.2.1 to 0.3.0. In 0.2.1, the hnsw index is constructed asynchronously. Therefore when create index
is finished, the query is done by a brute force scan at the beginning. And the real hnsw is constructed asynchronously in the back threads. Until the construction is finished, it will use the hnsw index and you'll see the query is much faster.
Question: After running create index, and it returns / completed, should I expect higher than idle CPU usage on the Postgres container? I note that it seems like Indexing is still running.
Yes, it's still running in the background process.
In 0.3.0, we decide to let create index
finish when the real index is constructed. So you'll see it took much longer time for create index
, but the query will use index directly after that.
from pgvecto.rs.
0.2.1 is a stable version. You can use SELECT * FROM pg_vector_index_stat;
to check whether the real index is finished.
from pgvecto.rs.
What's the error you met on 0.3?
from pgvecto.rs.
Thank you @VoVAllen for the information about the differences in 0.3.0 and 0.2.1 - im happy testing on the Alpha as for now we are able to be flexible.
Right now, with 0.3.0 im not sure im seeing performance I expect, but moving to 0.3.0 from 0.2.1 has removed any crashing or disconnections from our PSQL client, which is awesome!
from pgvecto.rs.
Also, re performance, im not implying PGVecto.rs is slow, mostly that we are trying to find settings that work for our expected load. I suspect this can be closed as the main crashing issue is resolved, and most of my concerns are more suitable for Discord conversation / educating me on expected performance.
Thank you again!
from pgvecto.rs.
Also, re performance, im not implying PGVecto.rs is slow, mostly that we are trying to find settings that work for our expected load. I suspect this can be closed as the main crashing issue is resolved, and most of my concerns are more suitable for Discord conversation / educating me on expected performance.
Thank you again!
Welcome questions about the performance things in Discord!
from pgvecto.rs.
Related Issues (20)
- feat: ANN benchmark HOT 3
- bench(fdw): Latency HOT 4
- fix(bench): Fix ZillizBench HOT 1
- feat: Add pgvecto.rs to vector hub HOT 2
- unknown x86 target feature HOT 2
- install patched pgrx failed HOT 4
- Can I index array of vectors? HOT 4
- chore(ecosystem): Langchain Python SDK Bump Version HOT 2
- Feature Request: Add Sum Aggregation and Column-Wise Multiplication for Sparse Vectors HOT 4
- SELECT * FROM pg_vector_index_stat does not work with partitions HOT 4
- feat(CI): performance integration by codespeed
- feat: normalized hamming distance HOT 1
- Partitions and partial indexing HOT 4
- feat: 0.3 release schema update
- bug: possible deadlock in 0.3.0-alpha.2 HOT 5
- bug: `check` is called outside rayon HOT 2
- Permission denied on data directory after redeploy HOT 4
- ERROR: pgvecto.rs: IPC connection is closed unexpected (v0.2.1) HOT 2
- Question Re. Image Search Query With Distances Annotated As A New Column HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvecto.rs.