Giter VIP home page Giter VIP logo

Comments (12)

generall avatar generall commented on June 21, 2024

How do you deploy qdrant in your k8s?

We have several ready-made solutions like https://github.com/qdrant/qdrant-helm or hybrid-cloud https://hybrid-cloud.qdrant.tech/ where those problems are all resolved

from qdrant.

msciancalepore98 avatar msciancalepore98 commented on June 21, 2024

It's just a simple ECS task deploy, where EBS is used to provide persistency (efs). Is it possible that I need a non sharable disk? I saw in that helm example that the PVC is of type ReadWriteOnce..

from qdrant.

msciancalepore98 avatar msciancalepore98 commented on June 21, 2024

@generall If I delete all the LOCKS on disk using:

sudo rm */0/segments/*/payload_index/LOCK && sudo rm */0/segments/*/LOCK

If I trigger a rolling update, it goes fine and the new Qdrant instance recreates the LOCKs.

Now, why is this Failed to load local shard happening even when no other Qdrant instance is up at the same time? in that situation no process is holding the LOCK at all, hence the new Qdrant instance should be able to access the collections shards and restore them properly.

Also, when a Qdrant instance is shut down, it should cleanup the LOCK files properly. (I can see this locally as well, even if Qdrant is shut down, LOCK files are all over the place, is there a reason for this? Also, this is more weird due to the fact that I cannot reproduce this panic locally.)

from qdrant.

generall avatar generall commented on June 21, 2024

Hey @msciancalepore98, I can't give you any guarantees of qdrant's work and what is expected to happen or not, if you continue to butchering storage internals like this.

from qdrant.

msciancalepore98 avatar msciancalepore98 commented on June 21, 2024

If you could actually help with proper debugging hints would be great as well, I am trying different things to get to the root cause of this behaviour with EFS.

Also, I can't use any auto-managed solution in my environment, only deploying tasks on ECS.

from qdrant.

generall avatar generall commented on June 21, 2024

We never tested qdrant on EFS, and I am not sure it is good idea to use it. Also I don't know what exactly you are trying to do, but if you are trying to mount same FS to multiple instances of qdrant - it is not going to work

from qdrant.

ryanlee588 avatar ryanlee588 commented on June 21, 2024

I am facing a similar issue. Commenting to stay to date

from qdrant.

timvisee avatar timvisee commented on June 21, 2024

Is it possible that I need a non sharable disk?

Correct. At least, in terms of file shares, we do recommend not to use this.

Also, each instance must have their very own storage directory. These cannot be shared. The cluster itself will take care of putting sharing all your data across the cluster and putting it in each storage directory separately.

We never tested qdrant on EFS

@generall In their FAQ they do promise strong consistency and support for proper file locking. But I also feel like we've seen issues with this before.

from qdrant.

janicetyp avatar janicetyp commented on June 21, 2024

hi @timvisee, i'm encountering a similar issue, wondering if you'd have any advice on how to preserve the existing collections while resolving the LOCK error? the qdrant instance we've got running on ECS keeps crashing due to this reason and I don’t see a way to resolve it without rebuilding the whole thing from scratch, TIA appreciate the help!

to add on a bit more info - we're deploying Qdrant on ECS with an EFS mount, we were facing the too many open files error and we increased the limit to 120k, but soon after we encountered a disk quota error. After referring to Qdrant discord, we tried to update from 1.6.1 to 1.9.0 which was unable to resolve the issue, now facing this LOCK problem after we reverted to version 1.6.1 with the same set up.

from qdrant.

timvisee avatar timvisee commented on June 21, 2024

And this happens on every restart, and you're 100% sure you don't have another instance running on the same data?

To be honest, I'm not entirely sure. We haven't hit this ourselves yet.

You might end up having to purge lock files yourself, but I have no idea what other damage that might do.

from qdrant.

pvieito avatar pvieito commented on June 21, 2024

Hey @timvisee @generall:

And this happens on every restart, and you're 100% sure you don't have another instance running on the same data?

This is an issue for example when you deploy Qdrant in a service that automatically monitors & relaunches it on failure, like ECS or Kubernetes. For example, imagine that Kubernetes is doing a health-check on the Qdrant endpoint, it starts to fail and it launches a new Qdrant to replace the old one, it connects it to the same storage but it has the LOCKs from the failed instance. Qdrant should have some sort of env-var or configuration to do a clean-up on start and remove any locks from previous failed instances / runs.

from qdrant.

timvisee avatar timvisee commented on June 21, 2024

Qdrant should have some sort of env-var or configuration to do a clean-up on start and remove any locks from previous failed instances / runs.

As far as I'm aware, it does this already.

Running locally and killing with kill -9 doesn't show this. We don't see this problem in normal k8s operation either. That's why I wonder whether locking on EFS is as good as they promise it to be.

Or are you saying the failed instance is still running while the new instance starts? In that case this would be expected behavior and that should be prevented.

I'll try to do some debugging later to see whether I can catch the same problem.

from qdrant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.