Comments (10)
The goal is to add up the resource usage of all of a users running jobs and prevent them from starting a new job if their resource usage would exceed some limit. The approach that Mark is suggesting sounds like it would work in most cases. In principle a user could exceed the limit by submitting some jobs that only specify nodes and others that only specify cores, but that would probably be rare in practice. Another possible place where these limits wouldn't work would be jobs that specify a number of cores that isn't an even multiple of the number of cores per node on node exclusive clusters, as it sounds like those jobs will effectively reserve more cores than the ncores
in the jobspec. Once again though, I don't know how common that scenario actually is in practice. The most common case for this is probably -n 1
, and we could pick that up easily by always assuming a running job is using at least one node, but I could also see someone submitting a bunch of -n 40
jobs on a cluster with 36 cores per node and getting many more jobs than the limit because those aren't being as using 72 cores.
from flux-core.
Very good points @ryanday36.
@cmoussa1 apologies, I had lost sight of the overall goal for the accounting limits. My apologies. I think you seem to be headed on the right track
from flux-core.
just to continue to be a pain here, if we're going to convert things, it probably makes more sense to convert to ncores
because that will work for both node exclusive and non-exclusive clusters (assuming we can tell if a cluster has a nodex match-policy and we can properly convert to the actual number of cores reserved for a given -n
on nodex clusters).
from flux-core.
To answer your original question, you can get access to the resources in an instance by fetching resource.R
from the KVS. You'll have to parse the result yourself though, we don't currently export an API to do that (though we've talked about it). The format for R is described in RFC 20
from flux-core.
I think the most complete solution might be to require both a cores and nodes limit for jobs, and if either is exceeded the job is rejected. This is what we ended up doing with the flux-core policy limits. This is mentioned in a note in flux-config-policy(5):
NOTE: Limit checks take place before the scheduler sees the request, so it is possible to bypass a node limit by requesting only cores, or the core limit by requesting only nodes (exclusively) since this part of the system does not have detailed resource information. Generally node and core limits should be configured in tandem to be effective on resource sets with uniform cores per node. Flux does not yet have a solution for node/core limits on heterogeneous resources.
from flux-core.
OK, that might be an OK start. Are you thinking the limit would be represented like:
resources = nnodes + ncores
or something different? and if a job might exceed a max_resources
, hold the job?
from flux-core.
I was thinking you'd check both values and if either exceeded the configured limit then the job is rejected. If you can't tell how much of a resource is in the jobspec, then just skip that test. That way you are always checking at least one limit.
from flux-core.
I think the goal (at least for accounting) here is to be able to enforce a resource limit across all of a user's running jobs. If we go with the above, if a job will exceed either limit (ncores
or nnodes
), I believe the job should be held until the user goes back under their limit. @ryanday36 should correct me if I am wrong, however.
But maybe we could just add a max_ncores
limit to all user rows and check both like you mentioned??
from flux-core.
No problem @grondo - I probably should've given more background as to why the limit needed to be there in the first place. So it sounds like we should keep separate counts of both ncores
and nnodes
across a user's set of running jobs?
This is mainly why I asked if there was a function to gather total node/core counts on a system with flux resource info
. 🙂 With this, I could at least estimate a cores-per-node count for that system, and when a user only specifies cores, it could be converted to a rough nnodes
count. I understand this might not be entirely accurate, especially for systems where there is not a uniform cores-per-node count across all nodes, but perhaps it's an okay start? Sorry if I am still misunderstanding.
(actually, now that I think about it, if the above sounds okay, then I'm not sure keeping track of ncores
across a user's set of running jobs is entirely necessary since we would be converting to nnodes
)
from flux-core.
Thanks for the advice here. After some time playing around with this I think I was able to get somewhere. I've opened a PR over in flux-framework/flux-accounting#469 that proposes adding some work during plugin initialization where it tries to at least estimate the cores-per-node on the system it's loaded on by fetching resource.R
. This could be a start to actually keeping track/estimating of a jobs' resources later on. Let me know what you think.
from flux-core.
Related Issues (20)
- ci: coverage build fails with gcov merge mismatch
- flux-queue: `-q, --queue` option ignored with `-v, --verbose` HOT 2
- consider preserving data from subinstances HOT 1
- Flux job info issued on a job in status SCHED returns an ambiguous/incorrect error message
- t2226-housekeeping.t: `not ok 21 - run a job and ensure error was logged` in conda build environment
- docker-run-systest.sh --rebuild no longer works on RHEL8
- raise fatal exception on torpid critical ranks of jobs
- t0015-job-output.py failures in RHEL9 HOT 10
- housekeeping: need a way to indicate partial release in RFC 27 hello protocol
- doc: clarify use of the `all` state in `flux resource list -s, --states`
- `flux resource list` adds an empty line with `-o "{queue} {nodelist}"`
- job-ingest loses track of running validator processes HOT 4
- job-exec: add a timeout after a fatal job exception to force `finish` response
- ERROR: jobspec: invalid JSON: \u0000 is not allowed without JSON_ALLOW_NUL HOT 1
- idea: do not start broker if content store ENOSPC on startup HOT 2
- rc3: can hang if disk out of space HOT 1
- job-exec: add start timeout?
- flux-cron: cron tasks force use of fork/exec by default
- systemd user instance exits with error, jobs hang in start if using the node
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux-core.