Hi friends. Firstly, thanks for this project. the VBASE filtered + semantic search loo

Reading in <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Crash on indexing with 0.2.1 docker container about pgvecto.rs HOT 12 CLOSED

vade commented on September 27, 2024

Crash on indexing with 0.2.1 docker container

from pgvecto.rs.

Comments (12)

vade commented on September 27, 2024

Reading in #409,

Im also trying to dump the pg_data/pg_vector indexes manually on disk, and REINDEX database;

which seemed to work.

For 15m vectors prior to indexing, Explain analyze on a faceted search took roughly 23 seconds.

Post index, it took 6 seconds.

from pgvecto.rs.

vade commented on September 27, 2024

Im not going to close, only because there might be some interesting data in here to debug why the first index pass fails.

from pgvecto.rs.

vade commented on September 27, 2024

Interesting. It seems like reindexing actually crashes but it happens in the background.

from pgvecto.rs.

VoVAllen commented on September 27, 2024

What's your hardware? How much memory do you have?

from pgvecto.rs.

VoVAllen commented on September 27, 2024

There might be some part already crashed. Can you try it with a fresh new database? Or manually delete all files under pgdata/pgvecto_rs and run REINDEX?

from pgvecto.rs.

vade commented on September 27, 2024

Hi There!

Im running Docker on an M2 Mac Pro with 32 Gb Ram. Docker has 5 CPUs and 20 GB allocated.

I was able to run tensorchord/pgvecto-rs:pg16-v0.3.0-alpha.1 and while it seems to build the index from the get go, Its not obvious to me if the index is being used in our queries using EXPLAIN ANALYZE.

Manually deleting that folder and running REINDEX did work for 0.2.1, but it seems unnecessary for 0.3.0

Question: After running create index, and it returns / completed, should I expect higher than idle CPU usage on the Postgres container? I note that it seems like Indexing is still running.

Thanks for any insight @VoVAllen

from pgvecto.rs.

VoVAllen commented on September 27, 2024

The behavior is changed between 0.2.1 to 0.3.0. In 0.2.1, the hnsw index is constructed asynchronously. Therefore when create index is finished, the query is done by a brute force scan at the beginning. And the real hnsw is constructed asynchronously in the back threads. Until the construction is finished, it will use the hnsw index and you'll see the query is much faster.

Question: After running create index, and it returns / completed, should I expect higher than idle CPU usage on the Postgres container? I note that it seems like Indexing is still running.

Yes, it's still running in the background process.

In 0.3.0, we decide to let create index finish when the real index is constructed. So you'll see it took much longer time for create index, but the query will use index directly after that.

from pgvecto.rs.

VoVAllen commented on September 27, 2024

0.2.1 is a stable version. You can use SELECT * FROM pg_vector_index_stat; to check whether the real index is finished.

from pgvecto.rs.

VoVAllen commented on September 27, 2024

What's the error you met on 0.3?

from pgvecto.rs.

vade commented on September 27, 2024

Thank you @VoVAllen for the information about the differences in 0.3.0 and 0.2.1 - im happy testing on the Alpha as for now we are able to be flexible.

Right now, with 0.3.0 im not sure im seeing performance I expect, but moving to 0.3.0 from 0.2.1 has removed any crashing or disconnections from our PSQL client, which is awesome!

from pgvecto.rs.

vade commented on September 27, 2024

Also, re performance, im not implying PGVecto.rs is slow, mostly that we are trying to find settings that work for our expected load. I suspect this can be closed as the main crashing issue is resolved, and most of my concerns are more suitable for Discord conversation / educating me on expected performance.

Thank you again!

from pgvecto.rs.

gaocegege commented on September 27, 2024

Also, re performance, im not implying PGVecto.rs is slow, mostly that we are trying to find settings that work for our expected load. I suspect this can be closed as the main crashing issue is resolved, and most of my concerns are more suitable for Discord conversation / educating me on expected performance.

Thank you again!

Welcome questions about the performance things in Discord!

from pgvecto.rs.

Crash on indexing with 0.2.1 docker container about pgvecto.rs HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent