Comments (10)
Hi @santiagorevale,
I had a discussion with @alneberg about just this only a few days ago. Yes, I think that it should be possible. Something like:
params.memory_per_core = 16.Gb
process {
cpus = 1
memory = { task.cpus * params.memory_per_core }
withName: 'bigprocess' {
cpus: 8
}
}
This is aside from check_max
- though that function can be wrapped around everything still to ensure that stuff doesn't go over the possible threshold.
Is this kind of what you were thinking of?
Phil
from rnaseq.
Hi @ewels,
The issue I'm referring is not exactly like that. It would be more like:
# Params
memory_per_core = 16
# Task requirements
cpus = 1
memory = 32
# Calculation
coreslots = cpus
memslots = ((task.memory % params.memory_per_core) + 1)
slots = max(coreslots, memslots)
# Re-defining task requirements
cpus = slots
So I was thinking that maybe the following changes should make it work for most scenarios. Let me know what you think about it. Please, double check the code because I don't have experience in Groovy.
nextflow.config
def check_slots( cpus, memory ) {
if (params.hasProperty("memory_per_core")) {
memory_slots = (memory % params.memory_per_core) + 1
slots = Math.max(cpus, memory_slots)
} else {
slots = cpus
}
return check_max( slots, 'cpus' )
}
base.conf
// ...
// Re-define "cpus" property
withName:makeSTARindex {
cpus = { check_slots( 10, 80.GB * task.attempt ) }
memory = { check_max( 80.GB * task.attempt, 'memory' ) }
time = { check_max( 5.h * task.attempt, 'time' ) }
}
// ...
Finally, the one scenario that is not being properly handled (also not a very common one) is what happens when you need to limit your cpu
s (you need a fix number) but based on memory requirements you have to specify more slots. For example:
# Software ABC requires 32 Gb per core
# Memory per core of your cluster is 16 Gb
# So you want to be able to do something like:
SLOTS=2
CPUS=1
qsub -pe shmem ${SLOTS} ./ABC --threads=${CPUS}
I couldn't come up with a solution that doesn't involve incorporating an additional variable to distinguish between slots and cpus. Thoughts?
Cheers,
Santiago
from rnaseq.
@pditommaso - any thoughts about handling this within core nextflow somehow?
from rnaseq.
My understanding is that it just sets the cpus
given the task memory
and params.memory_per_core
and being so just a variation of
process {
cpus = 1
memory = { task.cpus * params.memory_per_core }
}
What I'm missing ?
from rnaseq.
Yes I'm not totally sure either to be honest..
What happens when you need to limit your cpus (you need a fix number) but based on memory requirements you have to specify more slots.
I don't see why you would ever need to fix a number like this? Even if the tool only uses 1 cpu you can presumably give the process 2? It wastes a little cpu but that usually can't be used by other tasks on the same node anyway.
from rnaseq.
Tend to agree. Let's see what @santiagorevale says.
from rnaseq.
Hi guys,
The main goal is to be able to set up in base.config
the base cpu and memory requirements for each task.
However, the way it's currently set up:
withName:makeSTARindex {
cpus = { check_max( 10, 'cpus' ) }
memory = { check_max( 80.GB * task.attempt, 'memory' ) }
time = { check_max( 5.h * task.attempt, 'time' ) }
}
you are assigning:
- a fixed number of
10 cpus
, and 80 GB
of memory per attempt (a max of240 GB
)
This way of defining the task requirements is not compatible with a scenario where you can't queue a job based on memory requirements. Re-submitting this job by increasing required memory will result in the exact same output, because memory is allocated based on the number of CPUs specified.
You may wonder: "why don't you redefine in your config file every task requirement to meet your needs?" And that's what I want to avoid. The task requirements should always be met regarding which engine you are picking.
My understanding is that it just sets the
cpus
given the taskmemory
andparams.memory_per_core
and being so just a variation ofprocess { cpus = 1 memory = { task.cpus * params.memory_per_core } }What I'm missing ?
In my scenario, the memory
option is useless (sort of). In any case, it should be something like this:
process {
memory = 64.GB
cpus = { task.memory / params.memory_per_core }
}
What happens when you need to limit your cpus (you need a fix number) but based on memory requirements you have to specify more slots.
I don't see why you would ever need to fix a number like this? Even if the tool only uses 1 cpu you can presumably give the process 2?
It wastes a little cpu but that usually can't be used by other tasks on the same node anyway.
There are applications that allocate memory based on the number of cores so, if you specify 1 core, it will allocate 8 Gb of memory, 2 cores -> 16 Gb, and so on. Thus, specifying slots and CPUs become two independent things. See the following example:
Scenario
The software application
requires 8 Gb per CPU
to run properly.
# 1 CPU assigned to the application
# 1 slot reserved -> 4 Gb of Memory reserved
# This job will fail.
qsub -b y -pe env 1 bash application -cpus 1 *.fastq.gz
# 1 CPU assigned to the application
# 2 slots reserved -> 8 Gb of Memory reserved
# This job will work.
qsub -b y -pe env 2 bash application -cpus 1 *.fastq.gz
Please, let me know if this is still confusing.
Cheers,
Santiago
from rnaseq.
In any case, it should be something like this:
process {
memory = 64.GB
cpus = { task.memory / params.memory_per_core }
}
You can do that, just add .giga
to the memory attribute ie. `
process {
memory = 64.GB
cpus = { task.memory.giga / params.memory_per_core }
}
from rnaseq.
Hi @pditommaso,
Thanks for the tip. However, that's not the appropriate solution. Again, the idea is to define the requirements once and not having to do that again per profile.
Currently, we are defining in base.config
stuff like:
// [...]
withName:dupradar {
cpus = { check_max( 1, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
}
withName:featureCounts {
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
}
// [...]
Now, because the cluster I'm using reserves memory based on core slots, I have to re-define this on my profile file, which will be looking like:
// [...]
withName:dupradar {
cpus = { check_max( task.memory.giga / params.memory_per_core, 'cpus' ) }
}
withName:featureCounts {
cpus = { check_max( task.memory.giga / params.memory_per_core, 'cpus' ) }
}
// [...]
And I'll have to do that for every task. And if any memory requirement changes in the future, the profile would have to be updated again, and this is something we should be avoiding.
from rnaseq.
I will close this as its rather a way of handling how to submit resource requirements than a particular issue with the pipeline. If this is of general concern, you can also add an issue with an improvement suggestion to nf-core/tools, and we might consider/discuss this for the template there.
from rnaseq.
Related Issues (20)
- MultiQC report is missing fastQC results on the dev branch HOT 11
- Dog seems unsupported, first issue is igenomes reference fasta dir vs RNAseq regex HOT 5
- GUNZIP does not work for relative paths specified with --fasta and --gtf HOT 1
- External Argument Documentation HOT 3
- Error running process `NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_JUNCTIONANNOTATION`
- pipeline no longer compatible with newest STAR 2.7.11b
- Show Salmon-based strand inference in MultiQC report HOT 1
- Add `tximport` for RSEM outputs
- modules missing stub
- Custom content plots missing in MultiQC ouptut HOT 1
- Unable to go from FASTQ to Salmon Quantification
- Add support to use the pipeline with NGS generated using zymo research kit HOT 3
- Stubs all the way down
- NFCORE_RNASEQ takes values from global scope
- Accept uncompressed files, for instance fastq HOT 1
- Test profiles broken for Platform
- Updating picard to 3.2.0 fails pipeline level tests HOT 1
- To use STAR module, how should the GTF file look like
- NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRR15731653)` terminated with an error exit status (1)
- Sanity-check the sample names before committing to any data processing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rnaseq.