snakemake-profiles / doc Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 7.0 12 KB

Documentation of the Snakemake-Profiles project.

License: MIT License

doc's People

Contributors

Stargazers

Watchers

Forkers

simeloni zztin quanrd ctb cvdelannoy andreas-wilm jing-xinxing

doc's Issues

Rate limit for cluster submissions

For cluster submissions (which most repositories are about), it would be nice if there was a way to rate-limit submissions in addition to defining the maximum number of jobs.

This could be:

resources:
    n_per_min: 30 # rate: <human readable rate> would be even better

The reason for this is that one of the limiting factors is bandwidth of network-mounted storage.

If I have 1000 jobs I want to run that take 24h each, starting them with -j1000 would kill the file system and starting them with -j50 would take 20 times as long.

add my profile to snakemake-profiles

I created a profile mainly for slurm, but it acn be adapted to other cluster systems.

https://github.com/SilasK/adaptable_profile

HTCondor profile for review

I wrote a profile for submitting jobs to HTCondor clusters:
https://github.com/jheuel/htcondor

Using profile without using --conda-not-block-search-path-envvars

At least for snakemake 5.32, I can't use my snakemake SGE profile without using the new parameter --conda-not-block-search-path-envvars. If I don't provide the parameter, I get an import error when snakemake calls my job submission script:

Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/ll_pipelines/llmgqc/bin/ll_pipeline_utils/profiles/sge/sge-submit.py", line 7, in <module>
    from snakemake.utils import read_job_properties
ModuleNotFoundError: No module named 'snakemake'

The top of my job submission script:

#!/usr/bin/env python3
import os
import sys
import re
import subprocess

from snakemake.utils import read_job_properties

My sge-jobscript.sh:

#!/bin/bash
export OMP_NUM_THREADS=1
# properties = {properties}
if [[ -f ~/.bashrc &&  $(grep -c "__conda_setup=" ~/.bashrc) -gt 0 && $(grep -c "unset __conda_setup" ~/.bashrc) -gt 0 ]]; then
   echo "Sourcing .bashrc" 1>&2
   . ~/.bashrc
else
   echo "Exporting conda PATH" 1>&2
   export PATH=/ebio/abt3_projects/software/dev/miniconda3_dev/bin:$PATH
fi

How do I avoid having to use --conda-not-block-search-path-envvars every time I run snakemake? I tried adding conda activate snakemake to the bottom of my sge-jobscript.sh, but that didn't help

Join the team

Please post here in order to join the team.

Define guidelines for the units

Hey, thank you for this valuable resources.

I recommend to define guidelines for the keywords to put in the rules and their units of time, memory...
Otherwise switching from one profile to the other would need to adapt all resource items in the Snakefile or do create a new snakemake-profile.

what do you think?

Logging to the snakemake log file

I wondered if it is possible to make log entries to the main snakemake logfile from the submit or status script?

profiles documentation

Hi snakemakers! I don't think this is the right place to ask, but I am not sure where else to go. Is there an introductory explanation for beginners about how to set up a profile and specify resource requirements such as memory and walltime for each rule in a snakefile? Thank you!

How to define key=var arguments in a snakemake profile yaml file

Some of the snakemake arguments follow the key=value pattern. I would like to add them to the snakemake profile yaml file.

for example, the command line arguments should be stored in the snakemake profile.

--default-resources mem=50 time=5

I tried the three following options but it didn't work:


default-resources: "mem=50 time=5"


default-resources: 
  - mem: 50
  - time: 5


default-resources: 
  mem: 50
  time: 5

Use cluster file in profile

Dear @johanneskoester ,

I would like to be able to specify resource and threads for all my snakemeke rules, but give the users of my pipeline the possibility to modify them easily.

One option is to do this in the config file add a 'threads' and 'mem' option for each rule or groups of rules. However, I think it would be more intelligent to separate this from the workflow and use the --cluster file, where default parameters and rule-specific parameters can be chosen.

But then there are rules, where the resource attributes can be passed to the command. So I would need to read the cluster file in the snakefile.

I wonder what was your thoughts on this.

Define guidelines for resource fields

There are some kinds of resources that are very common to use, e.g. mem, <wall>time, cores, etc.

It would be nice if the documentation provided a list of which resource types should be implemented by profiles and which common names to use to refer to those resources.

This would make the whole collection more portable.

Handling groups in cluster scripts

I updated to snakemake v5 and started using groups. Now my cluster_submitter script breaks and doesn't find the information in the cluster config file (--cluster). Apparently not even the default values can be used.

How should we write the definitions for groups in the cluster config file?

Share snakemake condaenvs with co-workers

Hello, I try to set up a server for a tutorial where all users have access to the same conda-prefix.

My idea is that I would install the conda envs once and everybody can use it to run the snakemake pipeline.
However now snakemake tries to recreate the already installed environments with a different hash.

Could you explain how this hash is generated? I thought it s based on the checksum of the file or something which doesn't change between the admin user and the others.

slurm profile for code review

Hi,

I have written a profile for SLURM: https://github.com/percyfal/slurm. The repo shoud follow your guidelines. In addition, there is also code for test-driven development, in which a slurm and snakemake container can be run in docker swarm mode for easy testing.

Cheers,

Per

profiles using drmaa?

I am wondering what is the best way to combine Snakemake profiles with drmaa.
As the --cluster-config option is deprecated in favor of --profile, one would need to set cluster specific configurations via profiles also for drmaa.

If a profile sets the --cluster option, a command, possibly a Python script, can be added to the profile which collects all the cluster configuration information and is used by Snakemake to submit the jobs. However, when supplying the --drmaa option, no such command can be passed, as Snakemake uses the API to submit jobs.

Thus, to add a rule specific cluster configuration one would again need --cluster-config, which is deprecated.
Another option would be to write a cluster-submission command supplied via --cluster that uses drmaa. But I am wondering, whether that is the intended way to go?

Downsides of forking 3rd party repositories here

If a repository is forked from another source, its issues will be disabled by default.

If I then report an issue with the profile to the original source, this will not be automatically propagated to this collection.

Wouldn't it be better to move repositories here instead of forking them? (and only forking if they change)

Authors would still get credit (because they are listed as commiters)

(Anyway, this is just a suggestion so feel free to ignore)

local/non-cluster profile

While playing around with cookiecutter I made a 'local' profile. I'm not sure about how useful it is, but since I made it, why not let you decide:

cookiecutter.json

{
  "profile_name": null,
  "cores":        null,
  "resources":    "",
  "config":       "",
  "conda":        ["True", "False"],
  "singularity":  ["True", "False"],
  "keep_going":   ["True", "False"],
  "additional":   ""
}

{{cookiecutter.profile_name}}/config.yaml

# execution
cores: {{cookiecutter.cores}}
{% if cookiecutter.resources|length -%}
resources:
{%- for keyval in cookiecutter.resources.replace(' ', '').split(',') -%}
{% set key, value = keyval.replace(':', '=').split('=') %}
  - {{key}}: {{value}}
{%- endfor %}
{%- endif %}
{% if cookiecutter.config|length -%}
config:
{%- for keyval in cookiecutter.config.replace(' ', '').split(',') -%}
{% set key, value = keyval.replace(':', '=').split('=') %}
  - {{key}}: {{value}}
{%- endfor %}
{%- endif %}
keep-going: {{cookiecutter.keep_going}}


# environment
use-conda: {{cookiecutter.conda}}
use-singularity: {{cookiecutter.singularity}}


{% if cookiecutter.additional|length -%}
# additional options
{%- for keyval in cookiecutter.additional.replace(' ', '').split(',') -%}
{% set key, value = keyval.replace(':', '=').split('=') %}
{{key}}: {{value}}
{%- endfor %}
{%- endif %}

example result:

# execution
cores: 46
resources:
  - parallel_downloads: 1
  - mem_gb: 64
config:
  - ascp_path: $HOME/.aspera/connect/bin/ascp
  - ascp_key: $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh
keep-going: True


# environment
use-conda: True
use-singularity: False


# additional options
force: True
summary: True

Pass arguments to submit script

I've seen you wrote a long list of arguments which can be given to the slurm-submit.py or torque-submit.py how can I pass them to the script?

I tried in the config.yaml
time:600

and it passed the 600 as target to the snakefile, which search for the file.

Include LSF profile

I have made an LSF profile: https://github.com/mbhall88/snakemake-lsf

I have tested it on our cluster with a couple of pipelines and it seems to be working. But if you have suggestions of how I can make proper tests that would be great. I saw the slurm repo has a pretty extensive test suite...

initializing conda

None of the profiles seem to include an initialization of conda, such as . ~.bashrc in which with the .bashrc includes the standard setup:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/ebio/abt3_projects/software/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/ebio/abt3_projects/software/miniconda3/etc/profile.d/conda.sh" ]; then
        . "/ebio/abt3_projects/software/miniconda3/etc/profile.d/conda.sh"
    else
        export PATH="/ebio/abt3_projects/software/miniconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

How does one use conda with any of the snakemake profiles? I've always used the following for my sge profile job script template:

#!/bin/bash
export OMP_NUM_THREADS=1
# properties = {properties}
 . ~/.bashrc
{exec_job}

... but since snakemake>5.25 I've been getting Perl and Python sys path issues (snakemake/snakemake#786). I was wondering if this is due to my job script (eg., may initializing conda isn't needed?). However, if I remove . ~/.bashrc from my job template script, all of the qsub jobs submitted by snakemake die and do not produce a log file. The sge stdout file is empty and the stderr file just contains the standard info, for example:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 128
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	validate_read2
	1
Select jobs to execute...
WorkflowError:
Failed to solve the job scheduling problem with pulp. Please report a bug and use --scheduler greedy as a workaround:
Pulp: Error while trying to execute, use msg=True for more detailscbc

I'm currently running snakemake 5.30.1. The cluster is running Ubuntu 18.04.5

CI tests for LSF profile

Great! So for testing, the ideal scenario would be to have a CI setup that executes an ad-hoc LSF cluster (e.g. via docker) to which we would automatically submit a few toy jobs. I am not sure whether such container images are available though.

Originally posted by @johanneskoester in #12 (comment)

Installation instructions

I just tried installing the slurm profile using the instructions provided, and it seems just using cookiecutter on the repository is not sufficient to have snakemake find it (and I think this is more general than just for slurm).

It would be great if you could add some general installation instructions here, like:

cd ~/.config/snakemake
cookiecutter <repo name>

Request profile inclusion: (Sun) Grid Engine

I've written a reasonably comprehensive snakemake profile for sun grid engine, please could it be reviewed and forked into the Snakemake-Profiles repo?

snakemake-gridengine/cookiecutter