4dn-dcic / tibanna Goto Github PK

Tibanna helps you run your genomic pipelines on Amazon cloud (AWS). It is used by the 4DN DCIC (4D Nucleome Data Coordination and Integration Center) to process data. Tibanna supports CWL/WDL (w/ docker), Snakemake (w/ conda) and custom Docker/shell command.

License: MIT License

Python 60.71% Shell 5.39% HTML 33.28% Common Workflow Language 0.16% Dockerfile 0.30% Makefile 0.17%

aws aws-lambda aws-step-function bioinformatics conda cwl cwl-workflow docker pipelines snakemake snakemake-workflow wdl wdl-workflow

tibanna's Introduction

Tibanna

Tibanna runs portable pipelines (in CWL/WDL/Snakemake/shell) on the AWS Cloud.

Install Tibanna.

pip install tibanna

Use CLI to set up the cloud component and run workflow.

# Deploy Unicorn to the Cloud (Unicorn = serverless scheduler/resource allocator).
tibanna deploy_unicorn --usergroup=mygroup

# Run CWL/WDL workflow on the Cloud.
tibanna run_workflow --input-json=myrun.json

Alternatively, use Python API.

from tibanna.core import API

# Deploy Unicorn to the Cloud.
API().deploy_unicorn(usergroup='mygroup')

# Run CWL/WDL workflow on the Cloud.
API().run_workflow(input_json='myrun.json')

Note: Starting 0.8.2, Tibanna supports local CWL/WDL files as well as shell commands and Snakemake workflows.

Note 2: As of Tibanna version 2.0.0, Python 3.7 (and lower) is no longer supported. Please switch to Python 3.11! Python 3.8 is also supported as a fallback, but please prefer 3.11 if you can.

Note 3: Starting 0.8.0, one no longer needs to git clone the Tibanna repo.

Please switch from invoke <command> to tibanna <command>!
We also renovated the Python API as an inheritable class to allow development around tibanna.

For more details, see Tibanna Documentation.

Also check out our paper in Bioinformatics.
A preprint can also be found on biorxiv.

tibanna's People

Contributors

Stargazers

Watchers

tibanna's Issues

Error running tibanna commands

Hi I'm trying to use tibanna to run snakemake pipelines on AWS. I get the following error message when I try running any tibanna command, including tibanna -v.

(smk_tibanna) PN119683:reference_genomes nolson$ tibanna -v
Traceback (most recent call last):
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/util/connection.py", line 61, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/httpsession.py", line 263, in send
    chunked=self._chunked(request.headers),
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/util/retry.py", line 376, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/connectionpool.py", line 376, in _make_request
    self._validate_conn(conn)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/connectionpool.py", line 994, in _validate_conn
    conn.connect()
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/connection.py", line 334, in connect
    conn = self._new_conn()
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7fc4007b50b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/tibanna/vars.py", line 8, in <module>
    AWS_ACCOUNT_NUMBER = boto3.client('sts').get_caller_identity().get('Account')
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/client.py", line 272, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/client.py", line 563, in _make_api_call
    operation_model, request_dict, request_context)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/client.py", line 582, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/endpoint.py", line 137, in _send_request
    success_response, exception):
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/endpoint.py", line 231, in _needs_retry
    caught_exception=caught_exception, request_dict=request_dict)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/retryhandler.py", line 251, in __call__
    caught_exception)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/retryhandler.py", line 277, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/retryhandler.py", line 317, in __call__
    caught_exception)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/retryhandler.py", line 223, in __call__
    attempt_number, caught_exception)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/endpoint.py", line 244, in _send
    return self.http_session.send(request)
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/botocore/httpsession.py", line 283, in send
    raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://sts.us-east-1a.amazonaws.com/"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nolson/anaconda3/envs/smk_tibanna/bin/tibanna", line 5, in <module>
    from tibanna.__main__ import main
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/tibanna/__main__.py", line 10, in <module>
    from .core import API
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/tibanna/core.py", line 16, in <module>
    from .vars import (
  File "/Users/nolson/anaconda3/envs/smk_tibanna/lib/python3.6/site-packages/tibanna/vars.py", line 10, in <module>
    raise Exception("Cannot find AWS_ACCOUNT_NUMBER: %s" % e)
Exception: Cannot find AWS_ACCOUNT_NUMBER: Could not connect to the endpoint URL: "https://sts.us-east-1a.amazonaws.com/"

Any help troubleshooting this error would is appreciated.

Thanks
Nate

Feature Request - Add Cores used to tibanna stat output

Tibanna is great for scaling up analysis on AWS, it doesn't play nicely with CPU quotas though. #267 While error handling for this special case would be appreciated, it would also be nice to be able to track how much of the quota is being used by the running jobs. I'm imagining that in addition to reporting instance type when calling "tibanna stat", we could also know how many CPUs are associated with that instance type. This doesn't eliminate the need for properly handling quota limits in tibanna but it should make it easier for users to track how their jobs interact with that quota by making it easier to know when a user is close to their quota, assuming they know there CPU quota, which they should.

Unless there is a reason that this feature is a bad idea, I think I'll look into making a pull request for it.

I'm not entirely sure what the best way to implement this is. Easiest is probably to just add a dictionary somewhere that maps from instance type to CPUs and manually update that as AWS updates its EC2 offerings. This dictionary can then be used during the 'stat' calls to add more information to the table. This has the drawback of requiring manual updates to the code to maintain functionality when aws updates its ec2 offerings and just feels generally clunky. Thoughts?

adding metrics no longer working?

killer lambda

You're probably deploying even when tests fail

.travis.yml:

script:
- invoke test --ignore-webdev  # temporarily ignore test_webdev which uses tibanna_pony
- source tests/awsf/test.sh
- echo "finished running tests on repo"
- echo " now deploy core lambda pacakges"
- if [ "$TRAVIS_BRANCH" = "production" ]; then echo "Test succeeded! PUBLISHING LAMBDAs";
invoke deploy_core all; fi

but ...

When one of the build commands returns a non-zero exit code, the Travis CI build runs the subsequent commands as well, and accumulates the build result.

(Other folks have been confused by this.)

Simplest fix is probably just to move the deploy step into the next section. I'll make a PR and reference this issue.

log into ffmeta

Add support for nvidia-docker or singularity.

It would be great to be able to either configure a custom ami to be used where different container runtimes can be defined or be able to run custom scripts to install other utilities.

Complete Teardown of tibanna

Thank you for developing this tool. I've been successfully using Tibanna.

I've followed tibanna documentation, but I'm not sure how to do a complete tear down after tibanna deploy_unicorn. I do see a step function being provisioned, are there any other resources? How about temporary files and logs?

Is there a simple command to remove everything?

Issues with STAR alignment reading external files

Hello. I am currently running an RNAseq pipeline through the snakemake/tibanna framework. I am having trouble when it comes to the alignment step. My alignment steps keeps failing with the error

"EXITING because of FATAL ERROR: could not open genome file /Users/jeremygoldstein/data/refseq/Homo_Sapien/STAR_index/genomeParameters.txt"
However, when I run the exact same command on normal snakemake, the alignment works fine, which makes me think this is an issue with tibanna and not STAR. Attatched is my specific rule for STAR.

Run with the command <snakemake --cores> The rule runs fine and produces the right output. However, with no change in the rule, the command <snakemake --tibanna --default-remote-prefix=rnaseq.raw --use-conda> results in the error mentioned above. Below is the error of the job run specifically. Any ideas on what might be causing this problem?

Fri May 29 21:46:35 UTC 2020
--2020-05-29 21:46:35-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//aws_decode_run_json.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.64.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.64.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/aws_decode_run_json.py [following]
--2020-05-29 21:46:35-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/aws_decode_run_json.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 12965 (13K) [text/plain]
Saving to: ‘aws_decode_run_json.py’

 0K .......... ..                                         100%  110M=0s

2020-05-29 21:46:36 (110 MB/s) - ‘aws_decode_run_json.py’ saved [12965/12965]

--2020-05-29 21:46:36-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//aws_update_run_json.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.64.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.64.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/aws_update_run_json.py [following]
--2020-05-29 21:46:36-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/aws_update_run_json.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 699 [text/plain]
Saving to: ‘aws_update_run_json.py’

 0K                                                       100% 67.6M=0s

2020-05-29 21:46:36 (67.6 MB/s) - ‘aws_update_run_json.py’ saved [699/699]

--2020-05-29 21:46:36-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//aws_upload_output_update_json.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.64.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.64.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/aws_upload_output_update_json.py [following]
--2020-05-29 21:46:36-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/aws_upload_output_update_json.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 9212 (9.0K) [text/plain]
Saving to: ‘aws_upload_output_update_json.py’

 0K ........                                              100%  108M=0s

2020-05-29 21:46:36 (108 MB/s) - ‘aws_upload_output_update_json.py’ saved [9212/9212]

--2020-05-29 21:46:36-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//download_workflow.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.64.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.64.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/download_workflow.py [following]
--2020-05-29 21:46:43-- https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/download_workflow.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 2076 (2.0K) [text/plain]
Saving to: ‘download_workflow.py’

 0K ..                                                    100% 58.8M=0s

2020-05-29 21:46:43 (58.8 MB/s) - ‘download_workflow.py’ saved [2076/2076]

rnaseq.raw
download: s3://rnaseq.raw/hIeV9EGFDIMF.run.json to ./hIeV9EGFDIMF.run.json
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 8G 0 disk
└─nvme0n1p1 259:2 0 8G 0 part /
nvme1n1 259:0 0 1G 0 disk
mke2fs 1.42.13 (17-May-2015)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: e1026635-9ac0-4cca-a000-18d050257869
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

main workflow file: Snakefile
workflow files: ['workflow/envs/fastQC-trimGalore.yaml', 'workflow/rules/trimGalore.smk', 'workflow/rules/STAR.smk', 'workflow/rules/fastQC.smk', 'workflow/envs/STAR.yaml', 'Snakefile']
downloading key hIeV9EGFDIMF.workflow/workflow/envs/fastQC-trimGalore.yaml from bucket rnaseq.raw to target /data1/snakemake/workflow/envs/fastQC-trimGalore.yaml
downloading key hIeV9EGFDIMF.workflow/workflow/rules/trimGalore.smk from bucket rnaseq.raw to target /data1/snakemake/workflow/rules/trimGalore.smk
downloading key hIeV9EGFDIMF.workflow/workflow/rules/STAR.smk from bucket rnaseq.raw to target /data1/snakemake/workflow/rules/STAR.smk
downloading key hIeV9EGFDIMF.workflow/workflow/rules/fastQC.smk from bucket rnaseq.raw to target /data1/snakemake/workflow/rules/fastQC.smk
downloading key hIeV9EGFDIMF.workflow/workflow/envs/STAR.yaml from bucket rnaseq.raw to target /data1/snakemake/workflow/envs/STAR.yaml
downloading key hIeV9EGFDIMF.workflow/Snakefile from bucket rnaseq.raw to target /data1/snakemake/Snakefile
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 152 100 152 0 0 6657 0 --:--:-- --:--:-- --:--:-- 6909
100 617 100 617 0 0 2448 0 --:--:-- --:--:-- --:--:-- 2448
100 13.3M 100 13.3M 0 0 28.7M 0 --:--:-- --:--:-- --:--:-- 28.7M
user_allow_other
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Login Succeeded
if [[ -z $(aws s3 ls s3://rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R2_val_2.fq/ ) ]]; then aws s3 cp s3://rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R2_val_2.fq /data1/snakemake/rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R2_val_2.fq ; else aws s3 cp --recursive s3://rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R2_val_2.fq /data1/snakemake/rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R2_val_2.fq ; fi
if [[ -z $(aws s3 ls s3://rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R1_val_1.fq/ ) ]]; then aws s3 cp s3://rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R1_val_1.fq /data1/snakemake/rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R1_val_1.fq ; else aws s3 cp --recursive s3://rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R1_val_1.fq /data1/snakemake/rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R1_val_1.fq ; fi
Fri May 29 21:47:16 UTC 2020
Fri May 29 21:47:19 UTC 2020
aws_decode_run_json.py
aws_update_run_json.py
aws_upload_output_update_json.py
cromwell
download_command_list.txt
download_workflow.py
env_command_list.txt
goofys-latest
hIeV9EGFDIMF.run.json
mount_command_list.txt
Fri May 29 21:47:19 UTC 2020
Fri May 29 21:47:19 UTC 2020
aws_decode_run_json.py
aws_update_run_json.py
aws_upload_output_update_json.py
cromwell
download_command_list.txt
download_workflow.py
env_command_list.txt
goofys-latest
hIeV9EGFDIMF.run.json
mount_command_list.txt
Filesystem 1K-blocks Used Available Use% Mounted on
udev 32658188 0 32658188 0% /dev
tmpfs 6533236 8644 6524592 1% /run
/dev/nvme0n1p1 8065444 2294780 5754280 29% /
tmpfs 32666176 0 32666176 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 32666176 0 32666176 0% /sys/fs/cgroup
/dev/nvme1n1 999320 8116 922392 1% /data1
/home/ubuntu
total 108K
-rwxr-xr-x 1 root root 15K May 29 21:46 aws_run_workflow_generic.sh
drwxr-xr-x 2 root root 4.0K Oct 12 2018 bin
drwxr-xr-x 3 root root 4.0K Oct 12 2018 boot
drwxr-xr-x 7 ubuntu root 4.0K May 29 21:46 data1
drwxr-xr-x 13 root root 3.3K May 29 21:46 dev
drwxr-xr-x 99 root root 4.0K May 29 21:47 etc
-rw-r--r-- 1 root root 0 May 29 21:46 hIeV9EGFDIMF.job_started
drwxr-xr-x 3 root root 4.0K May 31 2018 home
lrwxrwxrwx 1 root root 30 Oct 12 2018 initrd.img -> boot/initrd.img-4.4.0-1069-aws
lrwxrwxrwx 1 root root 30 May 22 2018 initrd.img.old -> boot/initrd.img-4.4.0-1060-aws
drwxr-xr-x 21 root root 4.0K May 31 2018 lib
drwxr-xr-x 2 root root 4.0K May 22 2018 lib64
drwx------ 2 root root 16K May 22 2018 lost+found
drwxr-xr-x 2 root root 4.0K May 22 2018 media
drwxr-xr-x 2 root root 4.0K May 22 2018 mnt
drwxr-xr-x 2 root root 4.0K May 22 2018 opt
dr-xr-xr-x 177 root root 0 May 29 21:46 proc
drwx------ 7 root root 4.0K May 29 21:47 root
drwxr-xr-x 24 root root 940 May 29 21:46 run
drwxr-xr-x 2 root root 12K Oct 12 2018 sbin
drwxr-xr-x 2 root root 4.0K May 31 2018 snap
drwxr-xr-x 2 root root 4.0K May 22 2018 srv
dr-xr-xr-x 13 root root 0 May 29 21:46 sys
drwxrwxrwt 8 root root 4.0K May 29 21:47 tmp
drwxr-xr-x 10 root root 4.0K May 22 2018 usr
drwxr-xr-x 13 root root 4.0K May 22 2018 var
lrwxrwxrwx 1 root root 27 Oct 12 2018 vmlinuz -> boot/vmlinuz-4.4.0-1069-aws
lrwxrwxrwx 1 root root 27 May 22 2018 vmlinuz.old -> boot/vmlinuz-4.4.0-1060-aws
total 32K
drwxr-xr-x 2 root root 4.0K May 29 21:46 input
drwx--x--x 2 ubuntu root 16K May 29 21:46 lost+found
drwxr-xr-x 2 root root 4.0K May 29 21:46 out
drwxr-xr-x 2 root root 4.0K May 29 21:46 reference
drwxr-xr-x 4 root root 4.0K May 29 21:47 snakemake
/data1/input:
total 0
/data1/snakemake:
total 12K
drwxr-xr-x 3 root root 4.0K May 29 21:47 rnaseq.raw
-rw-r--r-- 1 root root 2.5K May 29 21:46 Snakefile
drwxr-xr-x 4 root root 4.0K May 29 21:46 workflow

/data1/snakemake/rnaseq.raw:
total 4.0K
drwxr-xr-x 3 root root 4.0K May 29 21:47 results

/data1/snakemake/rnaseq.raw/results:
total 4.0K
drwxr-xr-x 2 root root 4.0K May 29 21:47 trimming

/data1/snakemake/rnaseq.raw/results/trimming:
total 6.7M
-rw-r--r-- 1 root root 3.3M May 29 16:20 LCS7616_RR_C1_SEQ5_1_R1_val_1.fq
-rw-r--r-- 1 root root 3.3M May 29 16:20 LCS7616_RR_C1_SEQ5_1_R2_val_2.fq

/data1/snakemake/workflow:
total 8.0K
drwxr-xr-x 2 root root 4.0K May 29 21:46 envs
drwxr-xr-x 2 root root 4.0K May 29 21:46 rules

/data1/snakemake/workflow/envs:
total 8.0K
-rw-r--r-- 1 root root 100 May 29 21:46 fastQC-trimGalore.yaml
-rw-r--r-- 1 root root 79 May 29 21:46 STAR.yaml

/data1/snakemake/workflow/rules:
total 12K
-rw-r--r-- 1 root root 566 May 29 21:46 fastQC.smk
-rw-r--r-- 1 root root 1.9K May 29 21:46 STAR.smk
-rw-r--r-- 1 root root 1.1K May 29 21:46 trimGalore.smk
running snakemake rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.out.sam --snakefile Snakefile --force -j1 --keep-target-files --keep-remote --latency-wait 0 --attempt 1 --force-use-threads --allowed-rules STAR --nocolor --notemp --no-hooks --nolock --use-conda in docker image snakemake/snakemake:v5.14.0...
Unable to find image 'snakemake/snakemake:v5.14.0' locally
v5.14.0: Pulling from snakemake/snakemake
070ec86b6816: Pulling fs layer
c8e04ffeb25c: Pulling fs layer
153f09d1a5f5: Pulling fs layer
3e879c9d7f8f: Pulling fs layer
3e879c9d7f8f: Waiting
070ec86b6816: Verifying Checksum
070ec86b6816: Download complete
3e879c9d7f8f: Verifying Checksum
3e879c9d7f8f: Download complete
c8e04ffeb25c: Verifying Checksum
c8e04ffeb25c: Download complete
070ec86b6816: Pull complete
c8e04ffeb25c: Pull complete
153f09d1a5f5: Verifying Checksum
153f09d1a5f5: Download complete
153f09d1a5f5: Pull complete
3e879c9d7f8f: Pull complete
Digest: sha256:08c79287e8e2ca9ebe5dd2d800f4c424e44f7bb3e1de29f39a0002c14f2e0703
Status: Downloaded newer image for snakemake/snakemake:v5.14.0
Building DAG of jobs...
Creating conda environment workflow/envs/STAR.yaml...
Downloading and installing remote packages.
top - 21:48:01 up 1 min, 0 users, load average: 1.58, 0.62, 0.23
Tasks: 178 total, 3 running, 175 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.6 us, 5.3 sy, 0.3 ni, 83.0 id, 2.6 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65332352 total, 61870388 free, 628820 used, 2833144 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 64136340 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7514 root 20 0 724856 232908 11708 S 100.0 0.4 0:06.07 conda-env
7534 root 20 0 65220 21112 4660 R 75.0 0.0 0:00.13 mon-put-in+
7535 root 20 0 65180 21252 4700 R 75.0 0.0 0:00.13 mon-put-in+
1 root 20 0 38028 6048 4020 S 0.0 0.0 0:02.13 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+
4.0K /data1/input/
4.0K /data1/tmp/
20K /data1/out/
Environment for workflow/envs/STAR.yaml created (location: .snakemake/conda/a54d8bc7)
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 STAR
1

[Fri May 29 21:48:30 2020]
Job 0: ...Aligning LCS7616_RR_C1_SEQ5_1 with STAR two-pass...please wait...

Activating conda environment: /data1/snakemake/.snakemake/conda/a54d8bc7
[Fri May 29 21:48:31 2020]
Error in rule STAR:
jobid: 0
output: rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.out.sam, rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Log.final.out, rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.toTranscriptome.out.bam, rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Chimeric.out.junction
log: rnaseq.raw/logs/alignment/LCS7616_RR_C1_SEQ5_1.log (check log file(s) for error message)
conda-env: /data1/snakemake/.snakemake/conda/a54d8bc7
shell:

            STAR --genomeDir /Users/jeremygoldstein/data/refseq/Homo_Sapien/STAR_index              --runThreadN 4          --readFilesIn rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R1_val_1.fq rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R2_val_2.fq                --outFileNamePrefix rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/                 --outReadsUnmapped None                 --twopassMode Basic             --outSAMunmapped Within                --chimSegmentMin 12             --chimJunctionOverhangMin 8             --chimOutJunctionFormat 1               --alignSJDBoverhangMin 10              --alignMatesGapMax 100000               --alignIntronMax 100000                 --alignSJstitchMismatchNmax 5 -1 5 5          --chimMultimapScoreRange 3               --chimScoreJunctionNonGTAG -4           --chimMultimapNmax 20           --chimNonchimScoreDropMin 10          --peOverlapNbasesMin 12          --peOverlapMMp 0.1              --alignInsertionFlush Right             --alignSplicedMateMapLminOverLmate 0          --alignSplicedMateMapLmin 30             --quantMode TranscriptomeSAM            --quantTranscriptomeBan IndelSoftclipSingleend          --outSAMmapqUnique 60 &> rnaseq.raw/logs/alignment/LCS7616_RR_C1_SEQ5_1.log
            
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Uploading to remote: rnaseq.raw/logs/alignment/LCS7616_RR_C1_SEQ5_1.log
Finished upload.
Removing output files of failed job STAR since they might be corrupted:
rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R1_val_1.fq, rnaseq.raw/results/trimming/LCS7616_RR_C1_SEQ5_1_R2_val_2.fq, rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.out.sam, rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.toTranscriptome.out.bam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /data1/snakemake/.snakemake/log/2020-05-29T214754.346910.snakemake.log
Snakefile is being executed!
The following are the raw samples
['LCS7616_RR_C1_NT_1', 'LCS7616_RR_C1_NT_2', 'LCS7616_RR_C1_NT_3', 'LCS7616_RR_C1_SEQ5_1', 'LCS7616_RR_C1_SEQ5_2', 'LCS7616_RR_C1_SEQ5_3']

With the following criterion
cell lines: ['C1']
treatments: ['NT', 'SEQ5']
replicates: ['1', '2', '3']
reads: ['R1', 'R2']
Fri May 29 21:48:33 UTC 2020
/data1/out/:
total 20K
-rw-r--r-- 1 root root 0 May 29 21:48 hIeV9EGFDIMF.error
-rw-r--r-- 1 root root 64 May 29 21:48 hIeV9EGFDIMF.md5sum.txt
-rwxr-xr-x 1 ubuntu root 16K May 29 21:48 hIeV9EGFDIMF.log
total 36K
drwx--x--x 2 ubuntu root 16K May 29 21:46 lost+found
drwxr-xr-x 2 root root 4.0K May 29 21:46 reference
drwxr-xr-x 2 root root 4.0K May 29 21:46 input
drwxr-xr-x 2 root root 4.0K May 29 21:47 tmp
drwxr-xr-x 5 root root 4.0K May 29 21:47 snakemake
drwxr-xr-x 2 root root 4.0K May 29 21:48 out
/data1/input/:
total 0
/data1/snakemake/:
total 12K
drwxr-xr-x 4 root root 4.0K May 29 21:46 workflow
-rw-r--r-- 1 root root 2.5K May 29 21:46 Snakefile
drwxr-xr-x 4 root root 4.0K May 29 21:48 rnaseq.raw

/data1/snakemake/workflow:
total 8.0K
drwxr-xr-x 2 root root 4.0K May 29 21:46 rules
drwxr-xr-x 2 root root 4.0K May 29 21:46 envs

/data1/snakemake/workflow/rules:
total 12K
-rw-r--r-- 1 root root 1.1K May 29 21:46 trimGalore.smk
-rw-r--r-- 1 root root 1.9K May 29 21:46 STAR.smk
-rw-r--r-- 1 root root 566 May 29 21:46 fastQC.smk

/data1/snakemake/workflow/envs:
total 8.0K
-rw-r--r-- 1 root root 100 May 29 21:46 fastQC-trimGalore.yaml
-rw-r--r-- 1 root root 79 May 29 21:46 STAR.yaml

/data1/snakemake/rnaseq.raw:
total 8.0K
drwxr-xr-x 3 root root 4.0K May 29 21:48 logs
drwxr-xr-x 3 root root 4.0K May 29 21:48 results

/data1/snakemake/rnaseq.raw/logs:
total 4.0K
drwxr-xr-x 2 root root 4.0K May 29 21:48 alignment

/data1/snakemake/rnaseq.raw/logs/alignment:
total 4.0K
-rw-r--r-- 1 root root 402 May 29 21:48 LCS7616_RR_C1_SEQ5_1.log

/data1/snakemake/rnaseq.raw/results:
total 4.0K
drwxr-xr-x 3 root root 4.0K May 29 21:48 alignment

/data1/snakemake/rnaseq.raw/results/alignment:
total 4.0K
drwxr-xr-x 5 root root 4.0K May 29 21:48 LCS7616_RR_C1_SEQ5_1

/data1/snakemake/rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1:
total 32K
drwx------ 2 root root 4.0K May 29 21:48 _STARtmp
drwx------ 2 root root 4.0K May 29 21:48 _STARpass1
drwx------ 2 root root 4.0K May 29 21:48 _STARgenome
-rw-r--r-- 1 root root 0 May 29 21:48 Log.progress.out
-rw-r--r-- 1 root root 20K May 29 21:48 Log.out

/data1/snakemake/rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/_STARtmp:
total 0

/data1/snakemake/rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/_STARpass1:
total 0

/data1/snakemake/rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/_STARgenome:
total 0
Traceback (most recent call last):
File "./aws_upload_output_update_json.py", line 157, in
raise Exception("output file {} upload to {} failed. %s".format(source, bucket + '/' + target) % e)
Exception: output file /data1/snakemake/rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.out.sam upload to rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.out.sam failed. [Errno 2] No such file or directory: '/data1/snakemake/rnaseq.raw/results/alignment/LCS7616_RR_C1_SEQ5_1/Aligned.out.sam'
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 8.5M 6.3G 1% /run
/dev/nvme0n1p1 7.7G 4.3G 3.5G 56% /
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/nvme1n1 976M 11M 899M 2% /data1

STAR.txt

Job fails without error message

Hello, back again. I have an issue where my jobs are failing with an AWSEMJobErrorException error. However, the output files are being correctly produced and stored, so it doesn't seem like there is anything actually wrong. I looked at the metrics and it wasn't an issue of running out of memory. The only thing that looks wrong in the job log is this line "Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password". Would you mind taking a log at the job log and letting me know if that error could be causing the job error, or it is something else? When I run the command with snakemake locally, with the input and output files in the same s3 buckets, there are no issues.

The metadata for the output files indicates that they are complete, because if I try to run a rule that depends on them, I don't any "incomplete files" error that usually occurs if an output file is tagged as incomplete in the metadata. So all senses the run seems to be successful, besides the fact that it says it fails. My current workaround has been to use the -k option with snakemake to ignore a failed job and keep going. But this makes me unable to group my jobs and obviously isn't exactly a solution to the problem.

Also every job seems to run for about 5 minutes before failing, if that helps. Thanks
runlog.txt

make pony input json consistent with unicorn input json

killer not working for pending job

cannot reopen issue after you closed it

I opened this issue to notify you that I have attached log and output to other issue. I cannot reopen issue after you closed it. Thanks

tutorial - how do we change bucket names and arn for different users?

Automate? How to add that to to tutorial?
If Isidro uses Su's setting, would he end up having access to her buckets through tibanna given the current setup?

EBS optimized auto-determine (x)

Snakemake target declaration breaks with tibanna

I have a snakemake workflow that includes a rule "busco". Rule below...

rule busco:
    input:
        "{base}.fasta"
    output:
        "busco_summary_{base}.txt",
        "busco_table_{base}.tsv"
    params:
        outbase = lambda wcs: wcs["base"],
        lineage = lambda wcs: config["busco"],
        summary = lambda wcs: f"{wcs['base']}/run_{config['busco']}/short_summary.txt",
        table = lambda wcs: f"{wcs['base']}/run_{config['busco']}/full_table.tsv",
        opt = opts.get("busco", "")
    threads:
        32
    conda:
        "envs/general.yml"
    shell:
        """
        busco {params.opt} -i {input[0]} -c {threads} -o {params.outbase} -l {params.lineage}  --mode geno
        mv {params.summary} {output[0]}
        mv {params.table} {output[1]}
        """

In plain English, this rule will run busco on some fasta file using the busco database specified by the config.

When I try to run this rule in this workflow, snakemake/tibanna seems unable to meaningfully resolve the wildcards. I get a "MissingRuleException". When I run the same command without the tibanna flag, everything works fine. Command for reference...

snakemake busco_summary_Vaccinium_oxycoccos_genome_v1.txt \
    -p --use-conda --tibanna \
    --default-remote-prefix mybucket/cranberry/oxycoccos/ \
    --default-resources disk_mb=200000 \
    --config basename=Vaccinium_oxycoccos_genome_v1 busco=viridiplantae_odb10 \
    --tibanna-config log_bucket=my-log-bucket

This behavior confuses me. Tibanna seems to be breaking the wildcard logic somehow. Running in verbose mode produces the following useless error...

Building DAG of jobs...
Full Traceback (most recent call last):
  File "/home/ubuntu/workflows/snakemake/snakemake/__init__.py", line 699, in snakemake
    keepincomplete=keep_incomplete,
  File "/home/ubuntu/workflows/snakemake/snakemake/workflow.py", line 641, in execute
    dag.init()
  File "/home/ubuntu/workflows/snakemake/snakemake/dag.py", line 172, in init
    job = self.update(self.file2jobs(file), file=file, progress=progress)
  File "/home/ubuntu/workflows/snakemake/snakemake/dag.py", line 1467, in file2jobs
    raise MissingRuleException(targetfile)
snakemake.exceptions.MissingRuleException: No rule to produce busco_summary_Vaccinium_oxycoccos_genome_v1.txt (if you use input functions make sure that they don't raise unexpected exceptions).

MissingRuleException:
No rule to produce busco_summary_Vaccinium_oxycoccos_genome_v1.txt (if you use input functions make sure that they don't raise unexpected exceptions).

Can Tibanna pull containers from the AWS Elastic Container Registry (ECR)?

Thank a lot for making Tibanna available. I am using the Elastic Container Registry on AWS to store my (private) containers.

Is Tibanna able to pull containers from ECR instead of dockerhub? (AWS Batch supports ECR out of the box.)

The main question, I believe, is whether a policy that grants access to ECR to the tibanna_ AIM group can be added. And whether it is passed on to the EC2 instances Tibanna creates?

Many thanks for any thoughts,
Thomas

AttributeError: 'Workflow' object has no attribute 'overwrite_configfile'

Following the SnakeMake tutorial, I'm getting the following error as I try to run a pipeline:

$ snakemake --tibanna --default-remote-prefix=tibanna-genovic/test_data
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 9223372036854775807
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	all
	2	call_variants
	1	combine_calls
	1	genotype_variants
	2	hard_filter_calls
	3	map_reads
	3	mark_duplicates
	1	merge_calls
	1	merge_variants
	1	multiqc
	1	plot_stats
	3	recalibrate_base_qualities
	3	samtools_stats
	2	select_calls
	1	snpeff
	2	trim_reads_pe
	1	trim_reads_se
	1	vcf_to_tsv
	30
running job using Tibanna...

[Tue Oct 15 13:04:57 2019]
rule trim_reads_pe:
    input: tibanna-genovic/test_data/data/reads/b.chr21.1.fq, tibanna-genovic/test_data/data/reads/b.chr21.2.fq
    output: tibanna-genovic/test_data/trimmed/B-1.1.fastq.gz, tibanna-genovic/test_data/trimmed/B-1.2.fastq.gz, tibanna-genovic/test_data/trimmed/B-1.1.unpaired.fastq.gz, tibanna-genovic/test_data/trimmed/B-1.2.unpaired.fastq.gz, tibanna-genovic/test_data/trimmed/B-1.trimlog.txt
    log: tibanna-genovic/test_data/logs/trimmomatic/B-1.log
    jobid: 25
    wildcards: sample=B, unit=1
    resources: mem_mb=3684, disk_mb=3684

Traceback (most recent call last):
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/__init__.py", line 611, in snakemake
    export_cwl=export_cwl,
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/workflow.py", line 775, in execute
    success = scheduler.schedule()
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/scheduler.py", line 365, in schedule
    self.run(job)
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/scheduler.py", line 384, in run
    error_callback=self._error,
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/executors.py", line 1829, in run
    tibanna_input = self.make_tibanna_input(job)
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/executors.py", line 1812, in make_tibanna_input
    self.add_workflow_files(job, tibanna_args)
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/executors.py", line 1723, in add_workflow_files
    self.workflow.overwrite_configfile, snakemake_dir
AttributeError: 'Workflow' object has no attribute 'overwrite_configfile'

Job fails even though it was actually successful

Hello,

I have been running a snakemake workflow through tibanna, and have been having the strange issue of a job failing with no indication as to why it failed. WHen I look at the runs log, there are no errors and everything seems successful, and the output files appeared in the S3 bucket, but I am curious as to why it is saying it is failing. Attatched is the log file of a run that says it failed. I also attached the lamda log.
lamda.log
When I looked at the log for the tool there are also no errors. Any advice on what is going on?
eMxeZbpJjp9V.log

mounting error for newer instance types

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:3 0 100G 0 disk
nvme0n1 259:0 0 8G 0 disk
├─nvme0n1p1 259:1 0 8G 0 part /
└─nvme0n1p128 259:2 0 1M 0 part
mke2fs 1.42.12 (29-Aug-2014)
The file /dev/xvdb does not exist and no size was specified.
mount: special device /dev/xvdb does not exist

happened with c5.xlarge reproducibly (4 cases)

Confirmation of Test Run

Hi,

I ran tibanna run_workflow --input-json=test_json/unicorn/my_test_tibanna_bucket.json and it outputted this:

about to start run md5-public-test
{'ResponseMetadata': {'RequestId': 'TGPTV5BMPBBCR060675KUHPLSFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Wed, 13 Nov 2019 21:58:12 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': 'TGPTV5BMPBBCR060675KUHPLSFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '2745614147'}, 'RetryAttempts': 0}}
response from aws was:
{'executionArn': 'arn:aws:states:us-east-2:558013456147:execution:tibanna_unicorn_test:md5-public-test', 'startDate': datetime.datetime(2019, 11, 13, 21, 58, 9, 357000, tzinfo=tzlocal()), 'ResponseMetadata': {'RequestId': '9b3a5047-68be-4715-b5e3-f61e19f3b85d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '9b3a5047-68be-4715-b5e3-f61e19f3b85d', 'content-type': 'application/x-amz-json-1.0', 'content-length': '132'}, 'RetryAttempts': 0}}
url to view status:
https://console.aws.amazon.com/states/home?region=us-east-2#/executions/details/arn:aws:states:us-east-2:558013456147:execution:tibanna_unicorn_test:md5-public-test
JOBID 5ZSMVaDTqcIa submitted
EXECUTION ARN = arn:aws:states:us-east-2:558013456147:execution:tibanna_unicorn_test:md5-public-test
Couldn't get a file descriptor referring to the console

Using the url to view status, it said it executed successfully. I just wanted to know what the above output then describes, or if there any errors. Thanks!

Tibanna jobs failing unintendedly

Every tibanna job I kick off runs for about five minutes and then dies. This started happening some time in the last week to all of my snakemake workflows as that is the last time I tried to run some jobs. I'm having trouble figuring out what is going wrong.

I've attached the tibanna log for the job even though I don't see anything informative in it : log_hSpIncnfdaf1.txt

Cloudwatch logs were a bit more interesting. Seems like everything is going normally and then this error shows up.

EC2 is terminated unintendedly for job hSpIncnfdaf1 - please rerun.: EC2UnintendedTerminationException
Traceback (most recent call last):
  File "/var/task/service.py", line 19, in handler
    return check_task(event)
  File "/var/task/tibanna/check_task.py", line 33, in check_task
    return CheckTask(input_json).run()
  File "/var/task/tibanna/check_task.py", line 108, in run
    raise EC2UnintendedTerminationException(errmsg)
tibanna.exceptions.EC2UnintendedTerminationException: EC2 is terminated unintendedly for job hSpIncnfdaf1 - please rerun.

Trouble Running Snakemake via Tibanna

Hi,

I am running custom snakemake via tibanna. However, the runs fail and it does not deposit snakemake log to corresponding s3 bucket I deployed. I have attached log file here for your review. Any suggestions.

This is the command I used: snakemake --tibanna --default-remote-prefix=tibannatestdata -s tibanna_ichorCNA_snakemake.mk

Thank you!

yuWQY5gCrdfz.log

meta workflow handling for pony

e.g. Koray submits data for the chip-seq pipeline as a whole and it spawns multiple tibanna runs with proper dependencies.

instance limit error handling

Tibanna workflow for an AWS role

I'm attempting to run a SnakeMake workflow on AWS using tibanna, following the instructions here.

I have an AWS project that has no IAM users. Instead, we have a role that all users (with sufficient permissions) can assume in order to access the project.

So while following the tutorial, I can get up to tibanna deploy_unicorn -g tibanna -b my_tibanna_bucket, but then I get stuck at tibanna add_user -u <username> -g <name>. Like I said, the project has no users, so this command doesn't make sense. Is there a way to add a role to the user group instead?

WDL/cromwell genomics workflow failing even though cromwell succeeded

Hi There
I am using tibanna version 0.16.0

I have a WDL workflow I'm running on my regular compute resources and I wanted to test it on tibanna. After setting up the input JSON with inputs I ran it and it seems the outputs are created and cromwell succeed.

Below is an excerpt from the log

[2020-03-14 21:03:40,88] [info] WorkflowManagerActor WorkflowActor-wfid123 is in a terminal state: WorkflowSucceededState
[2020-03-14 21:03:52,07] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "bwaAmpliconWorkflow.fsummary": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-featureCounts/execution/mysample.fc.txt.summary",
    "bwaAmpliconWorkflow.vepTSV": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-ensVEP/execution/mysample.vep.tsv",
    "bwaAmpliconWorkflow.final_calls": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-compressVCF/execution/mysample.vcf.gz",
    "bwaAmpliconWorkflow.veplogfiles": ["/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-vepToRDS/execution/glob-e91fbb25c2b9853fe22ffddf6fda6c39/mergevep.log.stderr.txt", "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/
    "bwaAmpliconWorkflow.vardict_vcf": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-varDict/execution/mysample.vcf",
    "bwaAmpliconWorkflow.varstats": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-vcfStats/execution/mysample.vcfstats.txt",
    "bwaAmpliconWorkflow.mqcdatafiles": ["/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-multiqc/execution/glob-a34db49fda6452b05da185003a41a644/multiqc_bcftools_stats.txt", "/data1/wdl/cromwell-executions/bwaAmpliconWorkfl
    "bwaAmpliconWorkflow.samalignstatsout": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-samAlignstats/execution/mysample.txt",
    "bwaAmpliconWorkflow.multiqc_report": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-multiqc/execution/multiqc_report.html",
    "bwaAmpliconWorkflow.fastqczfiles": ["/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-fastqc/execution/glob-ce15bb98ad750635e87174959a5d6b8a/mysample_fastqc.zip"],
    "bwaAmpliconWorkflow.vepfiles": ["/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-vepToRDS/execution/glob-ee3cac44669b7460fea102fa893f1b06/mysample.geneFrequency.tsv", "/data1/wdl/cromwell-executions/bwaAmpliconWo
    "bwaAmpliconWorkflow.final_tbi": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-compressVCF/execution/mysample.vcf.gz.tbi",
    "bwaAmpliconWorkflow.final_alignment_index": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-bwaAlign/execution/mysample.bam.bai",
    "bwaAmpliconWorkflow.vcfinfo": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-vcfToTsv/execution/mysample.info.tsv",
    "bwaAmpliconWorkflow.featuresout": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-featureCounts/execution/mysample.fc.txt",
    "bwaAmpliconWorkflow.final_alignment": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-bwaAlign/execution/mysample.bam",
    "bwaAmpliconWorkflow.alignstats": "/data1/wdl/cromwell-executions/bwaAmpliconWorkflow/wfid123/call-bwaAlign/execution/mysample.alnmetrics.txt"
  },
  "id": "wfid123"
}

[2020-03-14 21:03:52,76] [info] SingleWorkflowRunnerActor writing metadata to /data1/out/LfOYPUMIiw14.log.json
[2020-03-14 21:03:52,78] [info] Workflow polling stopped
[2020-03-14 21:03:52,79] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2020-03-14 21:03:52,80] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2020-03-14 21:03:52,80] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2020-03-14 21:03:52,80] [info] JobExecutionTokenDispenser stopped
[2020-03-14 21:03:52,80] [info] Aborting all running workflows.
[2020-03-14 21:03:52,81] [info] WorkflowStoreActor stopped
[2020-03-14 21:03:52,81] [info] WorkflowLogCopyRouter stopped
[2020-03-14 21:03:52,81] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2020-03-14 21:03:52,81] [info] WorkflowManagerActor All workflows finished
[2020-03-14 21:03:52,81] [info] WorkflowManagerActor stopped
[2020-03-14 21:03:52,81] [info] Connection pools shut down
[2020-03-14 21:03:52,81] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2020-03-14 21:03:52,81] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2020-03-14 21:03:52,81] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2020-03-14 21:03:52,81] [info] SubWorkflowStoreActor stopped
[2020-03-14 21:03:52,81] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2020-03-14 21:03:52,81] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2020-03-14 21:03:52,81] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2020-03-14 21:03:52,82] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2020-03-14 21:03:52,82] [info] DockerHashActor stopped
[2020-03-14 21:03:52,82] [info] IoProxy stopped
[2020-03-14 21:03:52,82] [info] CallCacheWriteActor stopped
[2020-03-14 21:03:52,82] [info] KvWriteActor Shutting down: 0 queued messages to process
[2020-03-14 21:03:52,82] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2020-03-14 21:03:52,82] [info] JobStoreActor stopped
[2020-03-14 21:03:52,82] [info] ServiceRegistryActor stopped
[2020-03-14 21:03:52,84] [info] Database closed
[2020-03-14 21:03:52,84] [info] Stream materializer shut down
[2020-03-14 21:03:52,84] [info] WDL HTTP import resolver closed
Sat Mar 14 21:03:54 UTC 2020
/data1/out/:
total 156K
-rw-r--r-- 1 root   root 67K Mar 14 21:03 LfOYPUMIiw14.log.json
-rw-r--r-- 1 root   root   0 Mar 14 21:03 LfOYPUMIiw14.md5sum.txt
-rwxr-xr-x 1 ubuntu root 81K Mar 14 21:03 LfOYPUMIiw14.log
total 36K
drwx--x--x 2 ubuntu root  16K Mar 14 20:48 lost+found
drwxr-xr-x 2 root   root 4.0K Mar 14 20:48 reference

And in the workflow log from tibanna log -j <jobid> the error I see is

Traceback (most recent call last):
  File "./aws_upload_output_update_json.py", line 108, in <module>
    if ofv['path'] in md5dict:
TypeError: unhashable type: 'list'
Filesystem      Size  Used Avail Use% Mounted on
udev            7.4G     0  7.4G   0% /dev
tmpfs           1.5G  8.5M  1.5G   1% /run
/dev/xvda1       20G   11G  8.8G  55% /
tmpfs           7.4G     0  7.4G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.4G     0  7.4G   0% /sys/fs/cgroup
/dev/xvdb       118G   24G   89G  22% /data1

And in my workflow input json I did not specify every single output target from my workflow because there are too many (unless I script it), so I set the output_target to an empty dict e.g

"output_target": {}

I can clearly see from the logs that the cromwell run succeeds but because the python script aws_upload_output_update_json.py fails it's throwing an error which puts the whole task into a failed state.

Any help or info to resolve this will be much appreciated

Job fails without error message

Well, I'm back. Everything seemed to be working so well. Workflow worked perfectly on several datasets. On a particularly large dataset, my workflow is now failing without an error message after running for about three days. output and log below...

wm7004.log1.txt
wm7004.out1.txt

Just like last time, it seems to be the idle instance detection that is biting me. Last couple real entries including error from cloudwatch below...


2020-05-30T14:45:30.508-07:00
2020-05-30T21:45:30.508Z e0b9a9bb-6394-44c2-8f09-b03806e1f618	{'instance_id': 'i-0f34a53ae2e553cca', 'filesystem': '/dev/nvme1n1', 'client': <botocore.client.CloudWatch object at 0x7ff5cfbd1ef0>, 'starttimes': [datetime.datetime(2020, 5, 30, 20, 45, 29, 693085, tzinfo=tzutc())], 'endtimes': [datetime.datetime(2020, 5, 31, 20, 45, 29, 693085, tzinfo=tzutc())], 'start': datetime.datetime(2020, 5, 30, 20, 45, 29, tzinfo=tzutc()), 'end': datetime.datetime(2020, 5, 30, 21, 45, 29, tzinfo=tzutc()), 'nTimeChunks': 1, 'list_files': [], 'starttime': datetime.datetime(2020, 5, 30, 20, 45, 29, 693085, tzinfo=tzutc()), 'endtime': datetime.datetime(2020, 5, 31, 20, 45, 29, 693085, tzinfo=tzutc()), 'max_mem_used_MB': 57764.3828125, 'min_mem_available_MB': 5531.6875, 'total_mem_MB': 63296.0703125, 'max_mem_utilization_percent': 91.26061464370629, 'max_cpu_utilization_percent': '', 'max_disk_space_utilization_percent': 13.0300469215003, 'max_disk_space_used_GB': 62.7008895874023, 'max_ebs_read_bytes': ''}

2020-05-30T14:45:30.801-07:00
not enough arguments for format string: TypeError Traceback (most recent call last): File "/var/task/service.py", line 19, in handler return check_task(event) File "/var/task/tibanna/check_task.py", line 33, in check_task return CheckTask(input_json).run() File "/var/task/tibanna/check_task.py", line 124, in run cw_res['max_ebs_read_bytes']) File "/var/task/tibanna/check_task.py", line 142, in terminate_idle_instance jobid, str(cpu), str(e) TypeError: not enough arguments for format string

I launched this unicorn using a pip installed version of tibanna 0.18.0 which is older than my pull request that fixed this error formatting.

All this said, I basically have three questions I'm looking for help with...

If I clone your repo, which includes my updates to this error reporting, is there any way to easily update the step functions on aws? Last time around, I just redeployed a new unicorn which works but it bifurcates my job history and results in even more step functions being on aws to manage
Whenever I try to run 'plot_metrics' on a job that has finished for more than a couple hours, I get the following error. Any idea what causes it?

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/snake/bin/tibanna", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/miniconda3/envs/snake/lib/python3.6/site-packages/tibanna/__main__.py", line 437, in main
    subcommandf(*sc_args)
  File "/home/ubuntu/miniconda3/envs/snake/lib/python3.6/site-packages/tibanna/__main__.py", line 382, in plot_metrics
    endtime=endtime, open_browser=not do_not_open_browser)
  File "/home/ubuntu/miniconda3/envs/snake/lib/python3.6/site-packages/tibanna/core.py", line 1005, in plot_metrics
    raise Exception("instance id not available for this run.")
Exception: instance id not available for this run.

any idea why my job would be crashing in a way that triggers the idle instance checks? The job ran for several days without issue.

Issues running with Snakemake

Hello,

I am having some issues running my snakemake RNAseq pipeline with tibanna on AWS. I am currently trying to run a single rule to run fastQC on remote files. I used deploy unicorn to hook tibanna up to the S3 bucket with the correct input files. The relevant party of snakemake file looks like this

`<rule all:
input:
expand("results/raw-QC/{sample}_{read}_fastqc.zip", sample=samples, read=reads)

include: 'workflow/rules/fastQC.smk'>`

and my fastQC.smk file looks like this

<rule fastQC: input: "LCS7616_test/{sample}_{read}.fq" output: "results/raw-QC/{sample}_{read}_fastqc.zip" conda: "../envs/fastQC-trimGalore.yaml" message: "...Checking the quality of {input}...please wait..." log: "logs/raw-qc/{sample}_{read}.log" #'results/logs/{sample}raw_fastqc.log' shell: 'fastqc {input} --outdir "results/raw-QC" &> {log}'>

my s3 bucket is called rnaseq.test and has the right folder structure to work with these commands. However when I try to run with the command

<snakemake --tibanna --default-remote-prefix=rnaseq.raw --verbose>

It is failing with no additional information, so I do not no how to proceed. Below is the log file of the run

`<Building DAG of jobs...
sources=/Users/jeremygoldstein/Snakemake/workflow/rules/fastQC.smk/Users/jeremygoldstein/Snakemake/workflow/envs/fastQC-trimGalore.yaml/Users/jeremygoldstein/Snakemake/Snakefile
precommand=
bucket=rnaseq.raw
subdir=rnaseq.raw
Using shell: /bin/bash
Provided cores: 9223372036854775807
Rules claiming more threads will be scaled down.
Conda environments: ignored
Job counts:
count jobs
1 all
12 fastQC
13
Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 1}
Ready jobs (12):
fastQC
fastQC
fastQC
fastQC
fastQC
fastQC
fastQC
fastQC
fastQC
fastQC
fastQC
fastQC
Selected jobs (1):
fastQC
Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 0}
running job using Tibanna...

[Mon May 11 15:01:42 2020]
Job 8: ...Checking the quality of rnaseq.raw/LCS7616_test/LCS7616_RR_C1_SEQ5_1_R2.fq...please wait...

job input rnaseq.raw/LCS7616_test/LCS7616_RR_C1_SEQ5_1_R2.fq
job input is remote= true
is remote default= true
job output rnaseq.raw/results/raw-QC/LCS7616_RR_C1_SEQ5_1_R2_fastqc.zip
job output is remote= true
is remote default= true
additional tibanna config: None
command = snakemake rnaseq.raw/results/raw-QC/LCS7616_RR_C1_SEQ5_1_R2_fastqc.zip --snakefile Snakefile --force -j1 --keep-target-files --keep-remote --latency-wait 0 --attempt 1 --force-use-threads --allowed-rules fastQC --nocolor --notemp --no-hooks --nolock
{
"jobid": "pyrplauyhhNz",
"config": {
"run_name": "snakemake-job-pyrplauyhhNz-rule-fastQC",
"mem": 0.9765625,
"cpu": 1,
"ebs_size": 1,
"log_bucket": "rnaseq.raw"
},
"args": {
"output_S3_bucket": "rnaseq.raw",
"language": "snakemake",
"container_image": "snakemake/snakemake:v5.14.0",
"input_files": {
"file:///data1/snakemake/rnaseq.raw/LCS7616_test/LCS7616_RR_C1_SEQ5_1_R2.fq": "s3://rnaseq.raw/LCS7616_test/LCS7616_RR_C1_SEQ5_1_R2.fq"
},
"output_target": {
"file:///data1/snakemake/rnaseq.raw/results/raw-QC/LCS7616_RR_C1_SEQ5_1_R2_fastqc.zip": "s3://rnaseq.raw/results/raw-QC/LCS7616_RR_C1_SEQ5_1_R2_fastqc.zip",
"file:///data1/snakemake/rnaseq.raw/logs/raw-qc/LCS7616_RR_C1_SEQ5_1_R2.log": "s3://rnaseq.raw/logs/raw-qc/LCS7616_RR_C1_SEQ5_1_R2.log"
},
"input_env": {},
"snakemake_directory_local": "/Users/jeremygoldstein/Snakemake",
"snakemake_main_filename": "Snakefile",
"snakemake_child_filenames": [
"workflow/envs/fastQC-trimGalore.yaml",
"workflow/rules/fastQC.smk"
],
"command": "snakemake rnaseq.raw/results/raw-QC/LCS7616_RR_C1_SEQ5_1_R2_fastqc.zip --snakefile Snakefile --force -j1 --keep-target-files --keep-remote --latency-wait 0 --attempt 1 --force-use-threads --allowed-rules fastQC --nocolor --notemp --no-hooks --nolock "
}
}
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: RUNNING
job snakemake-job-pyrplauyhhNz-rule-fastQC: FAILED
Cleanup job metadata.
Shutting down, this might take some time.
shutting down Tibanna executor
Exiting because a job execution failed. Look above for error message
Complete log: /Users/jeremygoldstein/Snakemake/.snakemake/log/2020-05-11T150135.858465.snakemake.log
unlocking
removing lock
removing lock
removed all locks>`

Is there any advice on how to proceed from here? thanks

AWS EC2 privacy and security

Hi Soo Lee,
We are going to use Tibanna with a great scenario for one of the Israeli Universities.
We would like to run Tibanna on AWS, and for that, we would love to develop some features and we need to understand whats the current status and if you can help out with the design.

Two practical questions -

Is there a way to add the new EC2 to a specific security group, VPC etc.?
How the scale-up and down works for the EC2? how do we define them?

Do you think we can jump to a call and discuss it further?

Thanks,
Roi Shillo

Tibanna Metrics isn't being gathered.

Hello!

I am using snakemake with tibanna. Is there something I have to enable/configure in order for metrics to be gathered? My postrun json metrics is empty:

"Metrics": {
            "max_mem_used_MB": "",
            "min_mem_available_MB": "",
            "total_mem_MB": "",
            "max_mem_utilization_percent": "",
            "max_cpu_utilization_percent": "",
            "max_disk_space_utilization_percent": "",
            "max_disk_space_used_GB": "",
            "max_ebs_read_bytes": ""
}

Thanks!

Lambda Function Failed with Key Error when Json is large

I get a key error in the check_task lambda function when there is a lot of outputs (presumably making the json length > limit).

{
  "error": "KeyError",
  "cause": {
    "errorMessage": "'commands'",
    "errorType": "KeyError",
    "stackTrace": [
      [
        "/var/task/service.py",
        19,
        "handler",
        "return check_task(event)"
      ],
      [
        "/var/task/tibanna/check_task.py",
        33,
        "check_task",
        "return CheckTask(input_json).run()"
      ],

Hitting vCPU limit

Hi,
While running Tibanna with Snakemake i keep hitting my vCPU limit. While Snakemake supports a job limit for Cluster executions (-j), it doesn't use that for Tibanna. Might this something that Tibanna should consider instead?

Job dies mysteriously

I have a snakemake workflow. It works when executed locally without issue. It uses conda environment definitions that seem to be working without issue. I can't get the thing to run through tibanna. As far as I can tell, everything is going fine, and then about an hour into job execution, the run just dies with no error messages of any kind. I've attached the snakemake output, the tibanna log file, and the Snakefile. If someone could give me some pointers about how to debug this, I'd really appreciate it....

outtest1.log
tibanna1.log
Snakefile.txt

As far as I can tell, tibanna is erring out during the first job "edta". I can clearly see some of the output from the tool in the "tibanna1.log" but the tool never seems to finish running. I haven't had any problem running the tool on identical fasta files locally using the same 'edta.yml' definition. Its incredibly frustrating that I can't find any error messages here.

Pip alert when running deploy_unicorn

Just now I ran tibanna deploy_unicorn, and, while it completed successfully, it did emit an Error that might be relevant. Just posting it here, in case you would like to investigate it. For all I know, maybe this isn't a problem with tibanna, but could be a problem with some packages upstream?

ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

Tibanna CWL conformance tests

Hi there,

Does Tibanna have any CWL conformance tests? Like those found here, for example: http://ci.commonwl.org/

Thanks,
Robert

docker snakemake image includes ancient singularity version

I'm trying to use snakemake with tibanna. As part of this, I have some containers that let me run my workflows. I had been using singularity-hub to host my containers, but it has a quota/limit on number of pulls per week.

My initial sollution was to just host the images on my aws bucket and pull them using https addresses. New versions of singularity handle this just fine. At least as of 3.5.3 which is the version I tend to use. The version of singularity packaged with the snakemake docker container is 2.6.1 Is there some good reason that the singularity version used is so old, or has it just gone ignored for a while?

Tibanna Group Jobs Upload Files in Apparently Arbitrary Order

I have a hybrid assembly workflow that mostly works fine. However, I've noticed that when I try to continue the workflow after a succesful partial run, tibanna wants to go through the trouble of rerunning the last job it already finished. For reference, this is the contents of the default remote prefix dir...

2020-08-13 17:38:18          0
2020-09-20 05:50:08 3683255812 Pennycress_1326_BWA_002_S2_R1_001.fastq.gz
2020-09-20 05:51:24 3720172921 Pennycress_1326_BWA_002_S2_R2_001.fastq.gz
2020-08-18 04:18:40        553 busco_summary_penny_1326.flye.racon3.txt
2020-08-18 04:18:49     139349 busco_table_penny_1326.flye.racon3.tsv
2020-08-17 23:06:17  430122007 penny_1326.flye.fasta
2020-08-17 23:06:19  809289877 penny_1326.flye.gfa
2020-08-18 04:18:37  420515794 penny_1326.flye.racon1.fasta
2020-08-18 04:18:44  419861456 penny_1326.flye.racon2.fasta
2020-08-18 04:18:40  419636936 penny_1326.flye.racon3.fasta
2020-08-16 23:54:05 7354043622 pennycress_1326.S003B2.S004B2.all.dedup.fastq.gz
2020-08-13 18:24:14 9676546743 pennycress_1326.S003B2.S004B2.all.fastq.gz

Note that "penny_1326.flye.racon3.fasta" exists. It was produced at the same time both the other racon runs were completed. We recently completed short read sequencing and are now polishing the assembly using the short reads (and a tool called Pilon). This operation accepts "penny_1326.flye.racon3.fasta" as input and improves the sequence quality by aligning short reads and editing the assembly as needed. This all seems normal but when I go to run the workflow, snakemake/tibanna wants to remake "penny_1326.flye.racon3.fasta". See the dry run here...

Building DAG of jobs...
Job counts:
        count   jobs
        1       all
        1       busco
        1       pilon
        1       racon
        4
[Sun Sep 20 16:40:08 2020]

group job polish (jobs in lexicogr. order):

    [Sun Sep 20 16:40:08 2020]
    rule busco:
        input: salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.pilon1.fasta
        output: salk-tm-dev/pennycress/1326/busco_summary_penny_1326.flye.racon3.pilon1.txt, salk-tm-dev/pennycress/1326/busco_table_penny_1326.flye.racon3.pilon1.tsv
        jobid: 2
        wildcards: base=penny_1326.flye.racon3.pilon1
        threads: 16
        resources: disk_mb=1000000, mem_mb=60000


        busco  -i salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.pilon1.fasta -c 16 -o penny_1326.flye.racon3.pilon1 -l eudicots_odb10  --mode geno
        mv penny_1326.flye.racon3.pilon1/run_eudicots_odb10/short_summary.txt salk-tm-dev/pennycress/1326/busco_summary_penny_1326.flye.racon3.pilon1.txt
        mv penny_1326.flye.racon3.pilon1/run_eudicots_odb10/full_table.tsv salk-tm-dev/pennycress/1326/busco_table_penny_1326.flye.racon3.pilon1.tsv


    [Sun Sep 20 16:40:08 2020]
    rule pilon:
        input: salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.fasta, salk-tm-dev/pennycress/1326/Pennycress_1326_BWA_002_S2_R1_001.fastq.gz, salk-tm-dev/pennycress/1326/Pennycress_1326_BWA_002_S2_R2_001.fastq.gz
        output: salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.pilon1.fasta
        jobid: 1
        wildcards: base=penny_1326.flye.racon3, n=1
        threads: 16
        resources: disk_mb=1000000, mem_mb=60000


        minimap2  -ax sr -t 16 salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.fasta salk-tm-dev/pennycress/1326/Pennycress_1326_BWA_002_S2_R1_001.fastq.gz salk-tm-dev/pennycress/1326/Pennycress_1326_BWA_002_S2_R2_001.fastq.gz | samtools sort > salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.pilon1.fasta.mm2.sr.bam
        samtools index salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.pilon1.fasta.mm2.sr.bam
        pilon  -Xmx54000M --genome salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.fasta --bam salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.pilon1.fasta.mm2.sr.bam --output penny_1326.flye.racon3.pilon1 --threads 16


    [Sun Sep 20 16:40:08 2020]
    rule racon:
        input: salk-tm-dev/pennycress/1326/penny_1326.flye.racon2.fasta, salk-tm-dev/pennycress/1326/pennycress_1326.S003B2.S004B2.all.dedup.fastq.gz
        output: salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.fasta
        jobid: 3
        wildcards: base=penny_1326.flye, n=3
        threads: 16
        resources: disk_mb=1000000, mem_mb=60000


        minimap2  -x map-ont -t 16 salk-tm-dev/pennycress/1326/penny_1326.flye.racon2.fasta salk-tm-dev/pennycress/1326/pennycress_1326.S003B2.S004B2.all.dedup.fastq.gz > salk-tm-dev/pennycress/1326/penny_1326.flye.racon2.fasta.mm2.paf
        racon  -t 16 salk-tm-dev/pennycress/1326/pennycress_1326.S003B2.S004B2.all.dedup.fastq.gz salk-tm-dev/pennycress/1326/penny_1326.flye.racon2.fasta.mm2.paf salk-tm-dev/pennycress/1326/penny_1326.flye.racon2.fasta > salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.fasta


[Sun Sep 20 16:40:08 2020]
localrule all:
    input: salk-tm-dev/pennycress/1326/penny_1326.flye.racon3.pilon1.fasta, salk-tm-dev/pennycress/1326/busco_summary_penny_1326.flye.racon3.pilon1.txt, salk-tm-dev/pennycress/1326/busco_table_penny_1326.flye.racon3.pilon1.tsv
    jobid: 0
    resources: disk_mb=1000000

Job counts:
        count   jobs
        1       all
        1       busco
        1       pilon
        1       racon
        4
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Note that the last rule in the dry run is trying to produce "penny_1326.flye.racon3.fasta" even though it already exists. Any idea why this is happening? I've never gotten this behavior when not using tibanna as the backend.

Cannot open log/postrunjson files previously viewed

When running the command: tibanna log --job-id=JOBIDHERE, I get the message: log/postrunjson file is not ready yet.Wait a few seconds/minutes and try again. which I understand is expected behavior. However, I get this message on log files I have already viewed previously. I am not sure if this is a client-side error or not, but I did not expect to see this message for log files I have previously viewed. Thank you!

Help test.sh fail, and remove `source`?

Without set -e, only the status from the last line in test.sh is going to matter.

$ bash

$ cat /tmp/not-failing.sh 
#!/bin/bash
false
echo 'still got here!'

$ source /tmp/not-failing.sh 
still got here!

$ cat /tmp/failing.sh 
#!/bin/bash
set -e
false
echo 'still got here!'

$ source /tmp/failing.sh 

$

... and, if we add set -e, it's probably better not to use source: It's better to be explicit about what needs to be exported to a script, and I think set -e can confuse Travis: they have internal scripts which return non-zero status, but which aren't really problems.

PR coming soon...

AttributeError: 'module' object has no attribute 'TemporaryDirectory'

Hi,

When installing Tibanna and running tibanna deploy_unicorn --usergroup=test this error pops up:

setting up tibanna usergroup environment on AWS...
WARNING: Without setting buckets (using --buckets),Tibanna would have access to only public buckets.To give permission to Tibanna for private buckets,use --buckets=,,...
creating iam permissions with tibanna policy prefix tibanna_test
No handlers could be found for logger "tibanna.utils"
Tibanna usergroup test has been created on AWS.
creating a new step function... tibanna_unicorn_test
deploying lambdas...
preparing for deploy...
name=check_task_awsem
role_arn=arn:aws:iam::558013456147:role/tibanna_test_check_task_awsem
Traceback (most recent call last):
File "/usr/bin/tibanna", line 11, in
load_entry_point('tibanna==0.10.2', 'console_scripts', 'tibanna')()
File "/usr/lib/python2.7/site-packages/tibanna-0.10.2-py2.7.egg/tibanna/main.py", line 412, in main
subcommandf(*sc_args)
File "/usr/lib/python2.7/site-packages/tibanna-0.10.2-py2.7.egg/tibanna/main.py", line 285, in deploy_unicorn
usergroup=usergroup, do_not_delete_public_access_block=do_not_delete_public_access_block)
File "/usr/lib/python2.7/site-packages/tibanna-0.10.2-py2.7.egg/tibanna/core.py", line 778, in deploy_unicorn
do_not_delete_public_access_block=do_not_delete_public_access_block)
File "/usr/lib/python2.7/site-packages/tibanna-0.10.2-py2.7.egg/tibanna/core.py", line 769, in deploy_tibanna
self.deploy_core('all', suffix=suffix, usergroup=usergroup)
File "/usr/lib/python2.7/site-packages/tibanna-0.10.2-py2.7.egg/tibanna/core.py", line 711, in deploy_core
self.deploy_lambda(name, suffix, usergroup)
File "/usr/lib/python2.7/site-packages/tibanna-0.10.2-py2.7.egg/tibanna/core.py", line 699, in deploy_lambda
extra_config=extra_config)
File "/home/ec2-user/NGS_software/tibanna/.eggs/python_lambda_4dn-0.12.3-py2.7.egg/aws_lambda/aws_lambda.py", line 84, in deploy_function
with tempfile.TemporaryDirectory() as tmp_dir:
AttributeError: 'module' object has no attribute 'TemporaryDirectory'

Any suggestions? Thank you

setup_tibanna_env without bucket names creating a strange bucket policy

lol

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::/*"
            ],
            "Effect": "Allow"
        }
    ]
}

Can ebs_size get increased via the snakemake command?

Majority of the time, I work with relatively small bcl folders. Sometimes I have very large bcl folders that I want to download. Currently I specify a disk_mb in the snakemake rule. How do you recommend increasing the size of disk_mb when data grows? Can I do something like --tibanna-config ebs_size=10*{resources.disk_mb}? I know from your documentation: It can be provided in the format of x (e.g. 3x, 5.5x) to request x times total input size, however directories do not work as inputs or outputs (issue: snakemake/snakemake#355).

Cannot use cleanup when group starts with tibanna

I made a group called tibannatest & as a result I cannot use tibanna cleanup because it assumes the group should not start with tibanna (lines 1127 & 1128 in core.py).

if user_group_name.startswith('tibanna'): raise Exception("User_group_name does not start with tibanna or tibanna_unicorn."

Maybe add an exception saying groups starting with "tibanna" or "tibanna_unicorn" cannot be deployed? Maybe another option is just seeing if tibanna_unicorn_[group] or [group] exists. Then proceed to cleanup?

Costing of Tibanna Runs

Hi there,

Thanks for this software. I've been experimenting with it using Snakemake workflows and trying to get my head around how it works. If I understand correctly, what is happening under the hood:

Each rule/job is converted into an AWS lambda function
A set of lambda functions are strung together using AWS step functions
Each lambda function will launch an EC2 instance, download the files it needs from S3, run the job/rule, upload the files to S3, and then terminate the instance.

In this way, it provides serverless computing as you don't need to provision EC2 instances before running the Snakemake workflow.

What is not clear to me is whether the charge for the EC2 instance is by the hour or by run-time, which is the major benefit of AWS batch? If it's by the hour, wouldn't Tibanna runs be expensive as each job requires the launching of a new EC2 instance?

Couldn't get a file descriptor referring to the console

Hi,

snakemake = 5.17.0
tibanna = 0.18.0

I am trying to run snakemake workflow on AWS using tibanna. I followed the snakemake instructions to deploy tibanna unicorn and other setups. AWS credentials configured accordingly.

My dry run was successful:
$snakemake -np --use-conda --tibanna --default-remote-prefix=extoolbackup/data
command output is as expected, and print all details I expected.

However, when I execute workflow:
$snakemake --use-conda --tibanna --default-remote-prefix=extoolbackup/data
executions failed and the error message is "Couldn't get a file descriptor referring to the console".

Although I am successfully executed workflow locally.
$snakemake --use-conda --verbose --cores 4

I have attached the output of workflow run with --verbose and Snakefile as well.

log.txt
Snakefile.txt
environment.yaml.txt

Thanks
Chandan

paper in Bioinformatics

paper in Bioinformatics link points to password protected website in README.md

Support for oracle cloud

Hello,

Thank you for this very useful tool. Do you have any plans to implement tibanna for oracle cloud? It has the same architecture as amazon S3.

Thank you!!

c5 instance not sending disk usage metric to cloudwatch

Tibanna ignoring mem for instance selection?

Hi,
I have a task set to run with 1 core and 2GB of memory, but it keeps using the t3.micro instance. Is this behavior correct? With 2 cores set it uses the small instance.
Step function output:

{
  "jobid": "x",
  "args": {
    "output_S3_bucket": "x",
    "language": "snakemake",
    "container_image": "snakemake/snakemake:v5.10.0",
    "input_files": {},
    "output_target": {
      "file:///data1/snakemake/reference_data/metaerg_install": "s3:/x/KnuttBinAnno/reference_data/metaerg_install",
      "file:///data1/snakemake/reference_data/install_metaerg.log": "s3://x/KnuttBinAnno/reference_data/install_metaerg.log"
    },
    "snakemake_directory_local": "/home/x/x",
    "snakemake_main_filename": "Snakefile",
    "snakemake_child_filenames": [
      "files/files"
    ],
    "command": "snakemake reference_data/metaerg_install --snakefile Snakefile --force -j2 --keep-target-files  --keep-remote --latency-wait 0 --attempt 1 --force-use-threads  --allowed-rules installMetaErg --nocolor --notemp --no-hooks --nolock  --use-conda  --default-resources \"mem_mb=max(2*input.size, 1000)\" \"disk_mb=max(2*input.size, 1000)\" ",
    "input_parameters": {},
    "input_env": {},
    "secondary_files": {},
    "secondary_output_target": {},
    "alt_cond_output_argnames": {},
    "additional_benchmarking_parameters": {},
    "app_version": "",
    "app_name": "",
    "snakemake_directory_url": "s3://x-x/x.workflow/",
    "dependency": {}
  },
  "_tibanna": {
    "run_id": "x",
    "env": "",
    "url": "x",
    "run_type": "generic",
    "run_name": "snakemake-job-x-rule-installMetaErg",
    "exec_arn": "x"
  },
  "config": {
    "run_name": "snakemake-job-x-rule-installMetaErg",
    "mem": 2,
    "cpu": 1,
    "ebs_size": 2,
    "log_bucket": "x",
    "instance_type": "t3.micro",
    "EBS_optimized": true,
    "ebs_iops": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "availability_zone": "",
    "ebs_type": "gp2",
    "shutdown_min": "now",
    "spot_instance": false,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": false,
    "public_postrun_json": false,
    "root_ebs_size": 8,
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/",
    "json_bucket": "x",
    "ami_id": "ami-x",
    "language": "snakemake",
    "job_tag": "",
    "instance_id": "i-x",
    "instance_ip": "x",
    "start_time": "20200213-22:33:04-UTC"
  }
}

4dn-dcic / tibanna Goto Github PK

tibanna's Introduction

Tibanna

tibanna's People

Contributors

Stargazers

Watchers

Forkers

tibanna's Issues

Recommend Projects

Recommend Topics

Recommend Org