Comments (3)
Upon further investigation, this seems to be a result of Grouped jobs not being time stamped correctly when uploading to AWS. Tibanna wants to rerun the racon rule that produced "penny_1326.flye.racon3.fasta" because "penny_1326.flye.racon3.fasta" happened to get uploaded to aws before "penny_1326.flye.racon2.fasta" and "penny_1326.flye.racon2.fasta" is an input for the rule that produces "penny_1326.flye.racon3.fasta". This was only possible because I was executing all racon jobs as a single group in my earlier execution which causes all three racon files to apparently be uploaded in an arbitrary order.
The user level fix is to edit the time stamps in AWS, and clear out the ".snakemake" cache. Long term sollution is to make tibanna upload output files in the order defined by the DAG representing the group job being executed. Not being super familiar with the source code of snakemake or tibanna, I'm not certain this is an easy or even possible update to make
from tibanna.
@nhartwic Thank you for reporting this. The best fix would be to preserve time stamp for the output files but as far as I know AWS S3 does not provide that option. The output files from a given instance ('group') can be sorted before being uploaded to S3, but that still would not guarantee all the out files are uploaded in the correct order if there are multiple instances running concurrently (parallel independent group jobs). I'll see if I can at least get the files ordered within a group.
from tibanna.
Sounds good. As long as output files for each group is ordered correctly, that is probably sufficient as any dependencies of the group must have been uploaded prior to the groups execution and any downstream products must get uploaded after, just due to the way groups get spawned. The only potential errors would be if multiple partial runs were being performed in which the dag topology meaningfully changes, but I'd argue that in such cases, the rules themselves are the problem and Snakemake in general can't resolve the issue. As an example, imagine run 1 has structure "rule A -> Rule B" and run 2 has structure "rule B -> rule A -> rule C". This example should probably never happen and ought to be avoided by workflow writers.
from tibanna.
Related Issues (20)
- S3 Upload Encryption Argument HOT 11
- Minor bug when downloading files containing spaces to EC2 instance
- Tibanna1.0 errors out with Snakemake HOT 1
- Forked repo isn't used on EC2 instances even though it is declared at deployment HOT 3
- MissingInputException with Snakemake and Tibanna HOT 2
- Log Differences between 0.18.3 and 1.0+ HOT 3
- Specifying ECR AWSF_IMAGE in snakemake HOT 1
- Step functions fail for snakemake rules HOT 1
- Tibanna/Snakemake compatibility issue? HOT 4
- Docker image for 1.9.2 doesn't exist HOT 1
- AWS ending python 3.6 support for lambda functions HOT 1
- plot_metrics isn't producing some plots HOT 9
- botocore client error Rate Exceeded HOT 1
- Tibanna instance type error with snakemake HOT 29
- runtask failure : InvalidFleetConfig HOT 6
- Transfer ownership of Snakemake tibanna plugin HOT 8
- Large amounts of NAT gateway costs HOT 4
- Turn off cloudwatch and costexplorer HOT 6
- KMS Authentication Error in Lambda When Running Tibanna HOT 1
- deploy_unicorn: Pip Install Conflict Error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tibanna.