Giter VIP home page Giter VIP logo

Comments (7)

generall avatar generall commented on July 3, 2024

Hey @retinio,

there are optimization parameters:

                "deleted_threshold": 0.2,
                "vacuum_min_vector_number": 1000,

which define condition per segment when deleted vectors should trigger the optimizer.

Please note, that once deleted, vectors are not affecting search results in any way

from qdrant.

retinio avatar retinio commented on July 3, 2024

@generall
If my collection has deleted vectors more then points count Qdrant's optimization never run?

from qdrant.

timvisee avatar timvisee commented on July 3, 2024

As Andrey mentioned, those parameters are per segment. You have quite a bit of segments (48), so with this number of points it makes sense it has not been run yet. I wouldn't worry about it.

It is done this way because actually removing the vectors from disk immediately is more expensive than keeping them until enough have been deleted.

Also note that the point/vector counts are approximate and should not be relied upon. That is described here.

from qdrant.

retinio avatar retinio commented on July 3, 2024

@timvisee Thanks!
I have got one more question.
I have tried to reduce count of segments by setting in the config

storage:
    optimizers:

      # If the number of segments exceeds this value, the optimizer will merge the smallest segments.
      max_segment_number: 5     

Do I understand correctly when optimizer will be run my bit of segments (48) will be merged?

from qdrant.

timvisee avatar timvisee commented on July 3, 2024

Yes. According to your collection info you shared above you have "default_segment_number": 0, which means its chosen automatically. It defaults to the number of CPUs you have, which is likely why it is 48. Changing the above value like you suggested should reduce it further.

When changing this, you do need to trigger the optimizers at least once. I have drafted a documentation page on how you could do that, you can see a preview of it here. Sending another update operation, such as upsertion a point, is fine as well.

Note that a small number of segments is fine as long as you have a low number of points. If you plan to scale your setup, you likely want to stick to the default of 48 (your number of CPUs).

from qdrant.

retinio avatar retinio commented on July 3, 2024

thank you for answering @timvisee

from qdrant.

generall avatar generall commented on July 3, 2024

It defaults to the number of CPUs you have, which is likely why it is 48.

More precisely, it is number of sahrds times number of CPUs

from qdrant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.