koesgroup / snakemake_chipseq_pe Goto Github PK
View Code? Open in Web Editor NEWPipeline for the analysis of PE ChIP-seq data
License: Creative Commons Attribution Share Alike 4.0 International
Pipeline for the analysis of PE ChIP-seq data
License: Creative Commons Attribution Share Alike 4.0 International
To have a rapid overview of the pipeline I think it would be nice to include the DAG of the pipeline.
There are a lot of files that can be removed once the pipeline is finished. Here
To test the peak calling rules I have tested the bed branch
on real samples, the good news is that the peak calling works well. There is still the issue with the indexing of the bam file here #8 .
The problem is that the pipeline only called peaks for ATAC1 vs ATAC4, while it should also have work for all treatment and control.
I expect the files ATAC2 vs ATAC5 and ATAC3 vs ATAC6 as well.
To increase reproducibility and avoid the need to install and activate a conda virtual environemnt before running Snakemake, it would be good to create one environment file per rule (envs/rule1.yaml) in order to use snakemake with the --use-conda
argument.
Add singularity management within Snakemake.
From the documentation:
Singularity enables users to have full control of their environment. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data. This means that you don’t have to ask your cluster admin to install anything for you - you can put it in a Singularity container and run.
The snakemake pipeline is for now only usable for PE sequencing, it would be good to have it to work for single end as well.
Documentations of deeptools already exist in its repository, I think it would be nice to include part of the documentation in the README
file or at least links to it in order to explain what is the purpose of the figures generated by the pipeline and how to interpret them.
Small mistake to be changed on the bed branch, CASE should be treatment and CONTROLS should be control.
CASES = get_samples_per_treatment(treatment="treatment")
CONTROLS = get_samples_per_treatment(treatment="control")
Hello Jihed,
About the pull request #3:
Why do you want to report regions with zero coverage (bedtools genomecov -bga
)? Is there a specific reason? Because this will significantly increase the size of your bedgraph files.
I think you could simply use the -bg
option there and only report the positions with some coverage.
Hope it helps,
Cheers
Marc
For tomato, one has to generate the GTF file from the GFF3 format .
For other species, you can provide a GTF file directly.
Rules has to be changed into 'external_data.smk'.
Transform the Snakemake pipeline so that it can be executed on a cluster environment such as LISA (SURF). On LISA, the batch job management system is SLURM.
Add MultiQC at the end of the pipeline to produce html reports
I have found this repository which seems to allow to produce nice genome browser using bigwig, bed files, etc ...
This might be a good thing to add in the next release!
We should include a CITATION file in the main repository to indicate how to cite the pipeline.
To have a proper scientific publication, we could write a publication in the "Journal of Open Source Software".
That way, the pipeline could be properly cited. Publications are short ~3 pages and easy to write.
See an example:
http://joss.theoj.org/papers/6eb3ba7dddbdab8788a430eb62fc3841
Citation would look like:
Bennett et al., (2018). restez: Create and Query a Local Copy of GenBank in R. Journal of Open Source Software, 3(31), 1102, https://doi.org/10.21105/joss.01102
Implement the correlation plot using the deeptools:
Define the best correlation method
When running the snakefile with multiple core, it seems that the indexing of the bam file is done later than the rules using it.
'results/mapped/ChIP1_L1.sorted.rmdup.bam' does not appear to have an index. You MUST index the file first!
'results/mapped/ChIP1_L1.sorted.rmdup.bam' does not appear to have an index. You MUST index the file first!
Error in rule bamcompare:
jobid: 3
output: results/bamcompare/log2_ChIP1_ChIP2_L1.bamcompare.bw
RuleException:
CalledProcessError in line 249 of /Users/Jihed/Desktop/DMC1_ChIPseq/Snakefile:
Command ' set -euo pipefail; bamCompare -b1 results/mapped/ChIP1_L1.sorted.rmdup.bam -b2 results/mapped/ChIP2_L1.sorted.rmdup.bam -o results/bamcompare/log2_ChIP1_ChIP2_L1.bamcompare.bw ' returned non-zero exit status 1
File "/Users/Jihed/Desktop/DMC1_ChIPseq/Snakefile", line 249, in __rule_bamcompare
File "/anaconda3/envs/bigwig/lib/python3.5/concurrent/futures/thread.py", line 55, in run```
The rule producing the bam index file is `rule bam_index`, take sorted bam files as input. The output is required by the `rule_all`, so the file are produced at some point and it appears in the DAG. However I can not find the files `*.bai` in the output folder.
Some of the logs produced by the pipeline (branch develop) are empty files:
On the Deeptools branch:
For now the definition of groups to make the matrix is written in the Snakefile, therefore it is required to change the Snakefile to make.
Before finding a more handy solution to define the group, I should move the group definition to the configuration file and refer to it in the Snakefile
to test the automation
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.