snakemake-profiles / doc Goto Github PK
View Code? Open in Web Editor NEWDocumentation of the Snakemake-Profiles project.
License: MIT License
Documentation of the Snakemake-Profiles project.
License: MIT License
For cluster submissions (which most repositories are about), it would be nice if there was a way to rate-limit submissions in addition to defining the maximum number of jobs.
This could be:
resources:
n_per_min: 30 # rate: <human readable rate> would be even better
The reason for this is that one of the limiting factors is bandwidth of network-mounted storage.
If I have 1000 jobs I want to run that take 24h each, starting them with -j1000
would kill the file system and starting them with -j50
would take 20 times as long.
I created a profile mainly for slurm, but it acn be adapted to other cluster systems.
I wrote a profile for submitting jobs to HTCondor clusters:
https://github.com/jheuel/htcondor
At least for snakemake 5.32, I can't use my snakemake SGE profile without using the new parameter --conda-not-block-search-path-envvars
. If I don't provide the parameter, I get an import error when snakemake calls my job submission script:
Traceback (most recent call last):
File "/ebio/abt3_projects/software/dev/ll_pipelines/llmgqc/bin/ll_pipeline_utils/profiles/sge/sge-submit.py", line 7, in <module>
from snakemake.utils import read_job_properties
ModuleNotFoundError: No module named 'snakemake'
The top of my job submission script:
#!/usr/bin/env python3
import os
import sys
import re
import subprocess
from snakemake.utils import read_job_properties
My sge-jobscript.sh:
#!/bin/bash
export OMP_NUM_THREADS=1
# properties = {properties}
if [[ -f ~/.bashrc && $(grep -c "__conda_setup=" ~/.bashrc) -gt 0 && $(grep -c "unset __conda_setup" ~/.bashrc) -gt 0 ]]; then
echo "Sourcing .bashrc" 1>&2
. ~/.bashrc
else
echo "Exporting conda PATH" 1>&2
export PATH=/ebio/abt3_projects/software/dev/miniconda3_dev/bin:$PATH
fi
How do I avoid having to use --conda-not-block-search-path-envvars
every time I run snakemake? I tried adding conda activate snakemake
to the bottom of my sge-jobscript.sh
, but that didn't help
Please post here in order to join the team.
Hey, thank you for this valuable resources.
I recommend to define guidelines for the keywords to put in the rules and their units of time, memory...
Otherwise switching from one profile to the other would need to adapt all resource items in the Snakefile or do create a new snakemake-profile.
what do you think?
I wondered if it is possible to make log entries to the main snakemake logfile from the submit or status script?
Hi snakemakers! I don't think this is the right place to ask, but I am not sure where else to go. Is there an introductory explanation for beginners about how to set up a profile and specify resource requirements such as memory and walltime for each rule in a snakefile? Thank you!
Some of the snakemake arguments follow the key=value pattern. I would like to add them to the snakemake profile yaml file.
for example, the command line arguments should be stored in the snakemake profile.
--default-resources mem=50 time=5
I tried the three following options but it didn't work:
default-resources: "mem=50 time=5"
default-resources:
- mem: 50
- time: 5
default-resources:
mem: 50
time: 5
Dear @johanneskoester ,
I would like to be able to specify resource and threads for all my snakemeke rules, but give the users of my pipeline the possibility to modify them easily.
One option is to do this in the config file add a 'threads' and 'mem' option for each rule or groups of rules. However, I think it would be more intelligent to separate this from the workflow and use the --cluster
file, where default parameters and rule-specific parameters can be chosen.
But then there are rules, where the resource attributes can be passed to the command. So I would need to read the cluster file in the snakefile.
I wonder what was your thoughts on this.
There are some kinds of resources that are very common to use, e.g. mem
, <wall>time
, cores
, etc.
It would be nice if the documentation provided a list of which resource types should be implemented by profiles and which common names to use to refer to those resources.
This would make the whole collection more portable.
I updated to snakemake v5 and started using groups. Now my cluster_submitter script breaks and doesn't find the information in the cluster config file (--cluster
). Apparently not even the default values can be used.
How should we write the definitions for groups in the cluster config file?
Hello, I try to set up a server for a tutorial where all users have access to the same conda-prefix.
My idea is that I would install the conda envs once and everybody can use it to run the snakemake pipeline.
However now snakemake tries to recreate the already installed environments with a different hash.
Could you explain how this hash is generated? I thought it s based on the checksum of the file or something which doesn't change between the admin user and the others.
Hi,
I have written a profile for SLURM: https://github.com/percyfal/slurm. The repo shoud follow your guidelines. In addition, there is also code for test-driven development, in which a slurm and snakemake container can be run in docker swarm mode for easy testing.
Cheers,
Per
I am wondering what is the best way to combine Snakemake profiles with drmaa.
As the --cluster-config option is deprecated in favor of --profile, one would need to set cluster specific configurations via profiles also for drmaa.
If a profile sets the --cluster option, a command, possibly a Python script, can be added to the profile which collects all the cluster configuration information and is used by Snakemake to submit the jobs. However, when supplying the --drmaa option, no such command can be passed, as Snakemake uses the API to submit jobs.
Thus, to add a rule specific cluster configuration one would again need --cluster-config, which is deprecated.
Another option would be to write a cluster-submission command supplied via --cluster that uses drmaa. But I am wondering, whether that is the intended way to go?
If a repository is forked from another source, its issues will be disabled by default.
If I then report an issue with the profile to the original source, this will not be automatically propagated to this collection.
Wouldn't it be better to move repositories here instead of forking them? (and only forking if they change)
Authors would still get credit (because they are listed as commiters)
(Anyway, this is just a suggestion so feel free to ignore)
While playing around with cookiecutter I made a 'local' profile. I'm not sure about how useful it is, but since I made it, why not let you decide:
cookiecutter.json
{
"profile_name": null,
"cores": null,
"resources": "",
"config": "",
"conda": ["True", "False"],
"singularity": ["True", "False"],
"keep_going": ["True", "False"],
"additional": ""
}
{{cookiecutter.profile_name}}/config.yaml
# execution
cores: {{cookiecutter.cores}}
{% if cookiecutter.resources|length -%}
resources:
{%- for keyval in cookiecutter.resources.replace(' ', '').split(',') -%}
{% set key, value = keyval.replace(':', '=').split('=') %}
- {{key}}: {{value}}
{%- endfor %}
{%- endif %}
{% if cookiecutter.config|length -%}
config:
{%- for keyval in cookiecutter.config.replace(' ', '').split(',') -%}
{% set key, value = keyval.replace(':', '=').split('=') %}
- {{key}}: {{value}}
{%- endfor %}
{%- endif %}
keep-going: {{cookiecutter.keep_going}}
# environment
use-conda: {{cookiecutter.conda}}
use-singularity: {{cookiecutter.singularity}}
{% if cookiecutter.additional|length -%}
# additional options
{%- for keyval in cookiecutter.additional.replace(' ', '').split(',') -%}
{% set key, value = keyval.replace(':', '=').split('=') %}
{{key}}: {{value}}
{%- endfor %}
{%- endif %}
example result:
# execution
cores: 46
resources:
- parallel_downloads: 1
- mem_gb: 64
config:
- ascp_path: $HOME/.aspera/connect/bin/ascp
- ascp_key: $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh
keep-going: True
# environment
use-conda: True
use-singularity: False
# additional options
force: True
summary: True
I've seen you wrote a long list of arguments which can be given to the slurm-submit.py or torque-submit.py how can I pass them to the script?
I tried in the config.yaml
time:600
and it passed the 600 as target to the snakefile, which search for the file.
I have made an LSF profile: https://github.com/mbhall88/snakemake-lsf
I have tested it on our cluster with a couple of pipelines and it seems to be working. But if you have suggestions of how I can make proper tests that would be great. I saw the slurm repo has a pretty extensive test suite...
None of the profiles seem to include an initialization of conda, such as . ~.bashrc
in which with the .bashrc includes the standard setup:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/ebio/abt3_projects/software/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/ebio/abt3_projects/software/miniconda3/etc/profile.d/conda.sh" ]; then
. "/ebio/abt3_projects/software/miniconda3/etc/profile.d/conda.sh"
else
export PATH="/ebio/abt3_projects/software/miniconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
How does one use conda with any of the snakemake profiles? I've always used the following for my sge profile job script template:
#!/bin/bash
export OMP_NUM_THREADS=1
# properties = {properties}
. ~/.bashrc
{exec_job}
... but since snakemake>5.25
I've been getting Perl and Python sys path issues (snakemake/snakemake#786). I was wondering if this is due to my job script (eg., may initializing conda isn't needed?). However, if I remove . ~/.bashrc
from my job template script, all of the qsub jobs submitted by snakemake die and do not produce a log file. The sge stdout file is empty and the stderr file just contains the standard info, for example:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 128
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 validate_read2
1
Select jobs to execute...
WorkflowError:
Failed to solve the job scheduling problem with pulp. Please report a bug and use --scheduler greedy as a workaround:
Pulp: Error while trying to execute, use msg=True for more detailscbc
I'm currently running snakemake 5.30.1. The cluster is running Ubuntu 18.04.5
Great! So for testing, the ideal scenario would be to have a CI setup that executes an ad-hoc LSF cluster (e.g. via docker) to which we would automatically submit a few toy jobs. I am not sure whether such container images are available though.
Originally posted by @johanneskoester in #12 (comment)
I just tried installing the slurm profile using the instructions provided, and it seems just using cookiecutter on the repository is not sufficient to have snakemake find it (and I think this is more general than just for slurm).
It would be great if you could add some general installation instructions here, like:
cd ~/.config/snakemake
cookiecutter <repo name>
I've written a reasonably comprehensive snakemake profile for sun grid engine, please could it be reviewed and forked into the Snakemake-Profiles repo?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.