Comments (4)
What is the proposal here? I believe the default option now is to auto-approve nodes, and only schedule on approved and connected nodes. Is any of that still missing?
from bacalhau.
Yeah the only scheduling on connected and approved nodes is missing. Currently we schedule on disconnected node for some job types and ignore their approval state for other job types. frankly it's a bit of a mess:
- we need to address this TODO: https://github.com/bacalhau-project/bacalhau/blob/main/pkg/orchestrator/scheduler/batch_service_job.go#L158
- modify this section: https://github.com/bacalhau-project/bacalhau/blob/main/pkg/orchestrator/scheduler/daemon_job.go#L93
- modify this section: https://github.com/bacalhau-project/bacalhau/blob/main/pkg/orchestrator/scheduler/ops_job.go#L110
Or rather than modify, allow these aspects of scheduling to be configured.
Further we need to ensure that worked scheduled on an offline node runs when the node comes back online which we will need #3772 to do. e.g. the orchestrator could listen for connected events and create an evaluation to execute the work.
from bacalhau.
Another point to consider:
How can we allow users to define different scheduling heuristics for compute nodes. e.g. nodes in a data center ought to have a more strict requirement on connectedness than nodes that are expected to go offline for longer periods of time (e.g. submarine compute nodes)
from bacalhau.
@wdbaruni previously suggested adding another (a third) timeout in future which allows nodes to be offline for that long before being considered dead.
from bacalhau.
Related Issues (20)
- Housekeeping task to fail jobs exceeding total timeouts
- Reschedule jobs if no nodes were found
- Use new APIs for `wasm run`
- `bacalhau node list` returns error `failed request: invalid node type: nodeTypeUndefined` HOT 4
- Separate out streaming client into consumer and producer client and add heart beat logic
- Optimize the heart beating mainly, to do more optimistic cleanup instead of just removing all stream ids
- Add end to end tests using devstack for unhappy cases for log streaming
- obtuse error message from `bacalhau docker run` when issuing job with invalid parameters
- Handle graceful shutdown of produceer client via context
- Show full nodeID and executionID with `--wide` flag
- Cancel log stream on respective engines if the stream is cancelled due to any reason.
- Simplify node bootstrapping (fx)
- Move `bacalhau create` to `bacalhau job run`
- Move `bacalhau cancel` to `bacalhau job stop`
- Move `bacalhau list` to `bacalhau job list`
- Move `bacalhau id` to `bacalhau agent node`
- Move `bacalhau describe` to `bacalhau job describe`
- Move `bacalhau validate` to `bacalhau job validate`
- Move `bacalhau logs` to `bacalhau job logs`
- KVMigration Tests Flaky HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bacalhau.