Comments (4)
Hi I see here a message is already displayed when idle_timeout
is exceeded. Do I need to implement something similar in some other file? Any guidance would be much appreciated as I'm not deeply familiar with the codebase.
from dask-gateway.
I see no action point that seems reasonable to go for any more in this issue. It would be to provide a "reason" and propegate that from the scheduler, but that may be a bit too complicated and require touching a lot of things - so I don't think its worth doing.
I'll go for a close on this issue @udeet27, THANK YOU for initiating an investigation!! I'm sorry it was an issue that didn't turn out resolvable =/
from dask-gateway.
Ohh wow. It's a lot more complicated than I initially anticipated. Thanks for the detailed explanation. I'll look into the other issues and see if I can contribute in them.
from dask-gateway.
@udeet27 I'm don't overview the code base so well either so I had to dig in myself to help, doing so I was left uncertain what to do - because this can't be fixed easily. In brief, there were the controller, the dask-gateway-server, and the dask-scheduler. The idle_timeout
was logged by the scheduler, but communicated a shutdown to the dask-gateway-server, that made the controller do the job, but no information was passed from the scheduler about why the cluster was to be terminated. So, there is no way for the dask-gateway-server to convey that to the controller either etc.
Looking in this search I found this:
Okay hmm, it seems that this is how things work:
If a dask-cluster is created, its the dask-cluster's scheduler that is responsible for shutting down the cluster. So, the scheduler is logging that it is terminating the cluster its part of, and as part of that.
- A dask-gateway client somewhere asks the dask-gateway server to start a DaskCluster
- A dask cluster is created using a KubeBackend, that creates a k8s DaskCluster resource that is managed by a "controller" looking at DaskCluster resources
- The controller sees the DaskCluster resource and creates a dask cluster scheduler
- The dask cluster scheduler is monitoring its own activity, and ask the dask-gateway server to terminate the cluster the scheduler is managing when having idled for too long - when it does - it doesn't pass a reason or similar for terminating.
- The dask-gateway server receives the request to terminate the cluster, but doesn't understand its due to inactivity. The dask-gateway server makes the KubeBackend terminate the cluster, which it does by updating the DaskCluster k8s resources to "Stopped" I think
- The controller sees that the status update, and shuts down the scheduler and workers for the dask cluster.
from dask-gateway.
Related Issues (20)
- Dask Cluster Lifecycle Manager for Idle clusters HOT 6
- Adding envs key Helm values to gateway resources #688 HOT 1
- KILLED: dask.worker_X - Killed by user request. HOT 1
- Slurm Job Fails Due to Missing SSL Certificates When Creating Cluster using dask-gateway-server HOT 2
- Project's test are failing - help to debug greatly appreciated HOT 4
- SQLAlchemy default installs v2.0, dask-gateway-server uses 1.4.x syntax HOT 2
- Should the dask-gateway helm chart disable the worker pod's nanny? HOT 1
- Unpin setuptools in dask-gateway-server's build environment HOT 1
- Ensure all config has help strings for our configuration reference docs HOT 1
- Regular 404 requests to `/` in helm chart deployment of dask-gateway server (api pod) HOT 8
- Change of controller's log level of "Reconciling cluster"
- Cleanup k8s DaskCluster resources by introducing a `ttlSecondsAfterFinished` field respected by the controller?
- Test failures in main branch
- Kubernetes controller deoesn't respect worker_cores factions correctly
- Don't always set imagePullPolicy to IfNotPresent HOT 1
- AttributeError: 'GatewayCluster' object has no attribute 'wait_for_workers' HOT 2
- Fix logged aiohttp warning about "app key"
- Decide on `wait_for_workers` implementation in client cluster object
- Tests broken again
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-gateway.