Comments (7)
Hey @retinio,
there are optimization parameters:
"deleted_threshold": 0.2,
"vacuum_min_vector_number": 1000,
which define condition per segment when deleted vectors should trigger the optimizer.
Please note, that once deleted, vectors are not affecting search results in any way
from qdrant.
@generall
If my collection has deleted vectors more then points count Qdrant's optimization never run?
from qdrant.
As Andrey mentioned, those parameters are per segment. You have quite a bit of segments (48), so with this number of points it makes sense it has not been run yet. I wouldn't worry about it.
It is done this way because actually removing the vectors from disk immediately is more expensive than keeping them until enough have been deleted.
Also note that the point/vector counts are approximate and should not be relied upon. That is described here.
from qdrant.
@timvisee Thanks!
I have got one more question.
I have tried to reduce count of segments by setting in the config
storage:
optimizers:
# If the number of segments exceeds this value, the optimizer will merge the smallest segments.
max_segment_number: 5
Do I understand correctly when optimizer will be run my bit of segments (48) will be merged?
from qdrant.
Yes. According to your collection info you shared above you have "default_segment_number": 0,
which means its chosen automatically. It defaults to the number of CPUs you have, which is likely why it is 48. Changing the above value like you suggested should reduce it further.
When changing this, you do need to trigger the optimizers at least once. I have drafted a documentation page on how you could do that, you can see a preview of it here. Sending another update operation, such as upsertion a point, is fine as well.
Note that a small number of segments is fine as long as you have a low number of points. If you plan to scale your setup, you likely want to stick to the default of 48 (your number of CPUs).
from qdrant.
thank you for answering @timvisee
from qdrant.
It defaults to the number of CPUs you have, which is likely why it is 48.
More precisely, it is number of sahrds times number of CPUs
from qdrant.
Related Issues (20)
- Reverse image search not selecting obviously the most simmilar images in some cases HOT 2
- How to use singularity run qdrant? HOT 1
- [On disk payload index] Tracking issue
- Flaky test `multivector_filtrable_hnsw_test::test_multi_filterable_hnsw::case_6_recommend_multi`
- Discrepancy between indexed_vectors_count and points_count affecting query speed HOT 4
- Flaky test `tests::fix_payload_indices::test_fix_payload_indices`
- Score threshold filter HOT 1
- Crashed when inserting many points HOT 2
- How to use a Compound Indexes HOT 5
- Is Qdrant Query Speed Inconsistency Due to Caching? HOT 1
- Creating a collection returns Collection data already exists at ./storage/collections/... HOT 2
- Cannot delete collection with long name HOT 2
- qdrant memory optimization HOT 1
- qdrant memory optimization HOT 1
- Files missing from Debian package. HOT 1
- Missing results when invoke "search" api under certain filter conditions. HOT 2
- Format error in JSON body: invalid type: string "10979dc7e0-9895-4b97-90d8-e7a0beb21e0b", expected usize at line 1 column 157 HOT 6
- Collection versions incompatability with recent update HOT 7
- Add qdrant image to AWS ECR Public Gallery HOT 1
- Hybrid search `Input should be a valid dictionary or instance of FusionQuery` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qdrant.