Comments (4)
I restarted flux, which was in a good state until a queue was started, then we got the same results as above.
This was noted in the logs:
[ +33.929493] job-manager[0]: sched.alloc-response: id=f24ehES5k3J7 already allocated
[ +33.929519] job-manager[0]: alloc: stop due to alloc response error: File exists
from flux-core.
After canceling the pending affected job and restarting, the same issue occurred, just with a different job in the queue. Another cancel and restart confirmed this.
from flux-core.
This seemed to only affect the jobs in one queue. Starting other queues individually had jobs running without the scheduler alloc-response error here.
from flux-core.
One possible improvement to avoid the negative alloc pending count would be to decrement the sent_count after the check for a job that already has resources:
diff --git a/src/modules/job-manager/alloc.c b/src/modules/job-manager/alloc.c
index 58cbe0c12..9ba82007f 100644
--- a/src/modules/job-manager/alloc.c
+++ b/src/modules/job-manager/alloc.c
@@ -182,7 +182,6 @@ static void alloc_response_cb (flux_t *h,
goto teardown;
}
(void)json_object_del (R, "scheduling");
- alloc->sent_count--;
if (!job) {
(void)free_request (alloc, id, R);
@@ -200,6 +199,7 @@ static void alloc_response_cb (flux_t *h,
errno = EEXIST;
goto teardown;
}
+ alloc->sent_count--;
job->R_redacted = json_incref (R);
if (annotations_update_and_publish (ctx, job, annotations) < 0)
flux_log_error (h, "annotations_update: id=%s", idf58 (id));
from flux-core.
Related Issues (20)
- sdexec: does not comply with RFC42 protocol
- broker: add timezone designator to log timestamps
- fluxion logs resource status changed for individual nodes HOT 2
- t2410-sdexec-memlimit.t hangs after job-exec switched to FLUX_SUBPROCESS_FLAGS_UNBUF HOT 3
- user feedback on error messages
- `flux overlay status` is slow on large systems
- sdexec: add stdin buffering HOT 2
- sdexec: broker segfault in outbuf_mark_free HOT 2
- flux-start silently ignores `--recovery` when `-s, --test-size` is also present
- python: `jobspec.setattr()` should probably default to `attributes.system` like the `--setattr` command line option
- pmi: MPI job working in v0.55 fails in v0.63 HOT 2
- job-list: support `ranks` constraint
- shell: doom: include hostname of rank that caused early exit if possible HOT 1
- job-manager possibly sends alloc requests after jobs have been canceled HOT 1
- shell: add hostnames to errors where possible HOT 1
- more detailed task exit status reporting
- Run administrative epilog even if job is canceled before starting HOT 1
- job-exec: valgrind error and hang running simple job HOT 1
- add draft/experimental feature class
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux-core.