Comments (12)
How do you deploy qdrant in your k8s?
We have several ready-made solutions like https://github.com/qdrant/qdrant-helm or hybrid-cloud https://hybrid-cloud.qdrant.tech/ where those problems are all resolved
from qdrant.
It's just a simple ECS task deploy, where EBS is used to provide persistency (efs). Is it possible that I need a non sharable disk? I saw in that helm example that the PVC is of type ReadWriteOnce..
from qdrant.
@generall If I delete all the LOCKS on disk using:
sudo rm */0/segments/*/payload_index/LOCK && sudo rm */0/segments/*/LOCK
If I trigger a rolling update, it goes fine and the new Qdrant instance recreates the LOCKs.
Now, why is this Failed to load local shard
happening even when no other Qdrant instance is up at the same time? in that situation no process is holding the LOCK at all, hence the new Qdrant instance should be able to access the collections shards and restore them properly.
Also, when a Qdrant instance is shut down, it should cleanup the LOCK files properly. (I can see this locally as well, even if Qdrant is shut down, LOCK files are all over the place, is there a reason for this? Also, this is more weird due to the fact that I cannot reproduce this panic locally.)
from qdrant.
Hey @msciancalepore98, I can't give you any guarantees of qdrant's work and what is expected to happen or not, if you continue to butchering storage internals like this.
from qdrant.
If you could actually help with proper debugging hints would be great as well, I am trying different things to get to the root cause of this behaviour with EFS.
Also, I can't use any auto-managed solution in my environment, only deploying tasks on ECS.
from qdrant.
We never tested qdrant on EFS, and I am not sure it is good idea to use it. Also I don't know what exactly you are trying to do, but if you are trying to mount same FS to multiple instances of qdrant - it is not going to work
from qdrant.
I am facing a similar issue. Commenting to stay to date
from qdrant.
Is it possible that I need a non sharable disk?
Correct. At least, in terms of file shares, we do recommend not to use this.
Also, each instance must have their very own storage directory. These cannot be shared. The cluster itself will take care of putting sharing all your data across the cluster and putting it in each storage directory separately.
We never tested qdrant on EFS
@generall In their FAQ they do promise strong consistency and support for proper file locking. But I also feel like we've seen issues with this before.
from qdrant.
hi @timvisee, i'm encountering a similar issue, wondering if you'd have any advice on how to preserve the existing collections while resolving the LOCK error? the qdrant instance we've got running on ECS keeps crashing due to this reason and I don’t see a way to resolve it without rebuilding the whole thing from scratch, TIA appreciate the help!
to add on a bit more info - we're deploying Qdrant on ECS with an EFS mount, we were facing the too many open files error and we increased the limit to 120k, but soon after we encountered a disk quota error. After referring to Qdrant discord, we tried to update from 1.6.1 to 1.9.0 which was unable to resolve the issue, now facing this LOCK problem after we reverted to version 1.6.1 with the same set up.
from qdrant.
And this happens on every restart, and you're 100% sure you don't have another instance running on the same data?
To be honest, I'm not entirely sure. We haven't hit this ourselves yet.
You might end up having to purge lock files yourself, but I have no idea what other damage that might do.
from qdrant.
And this happens on every restart, and you're 100% sure you don't have another instance running on the same data?
This is an issue for example when you deploy Qdrant in a service that automatically monitors & relaunches it on failure, like ECS or Kubernetes. For example, imagine that Kubernetes is doing a health-check on the Qdrant endpoint, it starts to fail and it launches a new Qdrant to replace the old one, it connects it to the same storage but it has the LOCKs from the failed instance. Qdrant should have some sort of env-var or configuration to do a clean-up on start and remove any locks from previous failed instances / runs.
from qdrant.
Qdrant should have some sort of env-var or configuration to do a clean-up on start and remove any locks from previous failed instances / runs.
As far as I'm aware, it does this already.
Running locally and killing with kill -9
doesn't show this. We don't see this problem in normal k8s operation either. That's why I wonder whether locking on EFS is as good as they promise it to be.
Or are you saying the failed instance is still running while the new instance starts? In that case this would be expected behavior and that should be prevented.
I'll try to do some debugging later to see whether I can catch the same problem.
from qdrant.
Related Issues (20)
- Geo Bounding Box Antimeridian Failure
- Handle Out-Of-Disk gracefully HOT 7
- Implement S3 snapshot manager HOT 11
- Support for f16 vector metrics HOT 9
- Unused vector name cost storage and (maybe)memory HOT 4
- add must unique filter HOT 1
- CPU usage rate is too high HOT 5
- Optimization task panicked after a collection is recovered, causing search API timeout HOT 5
- Error: Connection refused when running Docker image on gRPC port 6334 HOT 4
- [question]search API timeout , collection is yellow , not sure when will turn green and wanna know why? HOT 4
- Transfers with method snapshot fail when api auth enabled HOT 2
- untagged enum WithPayloadInterface error. HOT 1
- Optimizer panic HOT 1
- Collection with 60 points has huge size and write ahead log HOT 7
- Can´t set indexing_threshold to 0 for bulk upload HOT 1
- Backpressure when updating points to avoid OOM HOT 4
- [Question] How to retrieve metadata of document using spring AI? HOT 1
- I can't create or upload snapshot in Qdrant's cloud HOT 3
- Search API with neural question in payload HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qdrant.