Giter VIP home page Giter VIP logo

snakemake_chipseq_pe's People

Contributors

jihedc avatar mgalland avatar tijsbliek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

snakemake_chipseq_pe's Issues

Use all treatment and control

To test the peak calling rules I have tested the bed branch on real samples, the good news is that the peak calling works well. There is still the issue with the indexing of the bam file here #8 .

The problem is that the pipeline only called peaks for ATAC1 vs ATAC4, while it should also have work for all treatment and control.

  • treatment= 'ATAC1', 'ATAC2', 'ATAC3'
  • control = 'ATAC4', 'ATAC5', 'ATAC6'.

I expect the files ATAC2 vs ATAC5 and ATAC3 vs ATAC6 as well.

Create one .yaml environment file per rule

To increase reproducibility and avoid the need to install and activate a conda virtual environemnt before running Snakemake, it would be good to create one environment file per rule (envs/rule1.yaml) in order to use snakemake with the --use-conda argument.

Implement singularity

Add singularity management within Snakemake.

From the documentation:

Singularity enables users to have full control of their environment. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data. This means that you don’t have to ask your cluster admin to install anything for you - you can put it in a Singularity container and run.

Single end

The snakemake pipeline is for now only usable for PE sequencing, it would be good to have it to work for single end as well.

Make a documentation for deeptools

Documentations of deeptools already exist in its repository, I think it would be nice to include part of the documentation in the README file or at least links to it in order to explain what is the purpose of the figures generated by the pipeline and how to interpret them.

bedgraph -bga

Hello Jihed,
About the pull request #3:
Why do you want to report regions with zero coverage (bedtools genomecov -bga)? Is there a specific reason? Because this will significantly increase the size of your bedgraph files.
I think you could simply use the -bgoption there and only report the positions with some coverage.
Hope it helps,
Cheers
Marc

Handle GTF or GFF formats

For tomato, one has to generate the GTF file from the GFF3 format .
For other species, you can provide a GTF file directly.
Rules has to be changed into 'external_data.smk'.

HPC Cluster execution

Transform the Snakemake pipeline so that it can be executed on a cluster environment such as LISA (SURF). On LISA, the batch job management system is SLURM.

Write a publication in the "The Journal of Open Source Software"

We should include a CITATION file in the main repository to indicate how to cite the pipeline.
To have a proper scientific publication, we could write a publication in the "Journal of Open Source Software".
That way, the pipeline could be properly cited. Publications are short ~3 pages and easy to write.

See an example:
http://joss.theoj.org/papers/6eb3ba7dddbdab8788a430eb62fc3841

Citation would look like:
Bennett et al., (2018). restez: Create and Query a Local Copy of GenBank in R. Journal of Open Source Software, 3(31), 1102, https://doi.org/10.21105/joss.01102

Deeptools : Correlation plot

Implement the correlation plot using the deeptools:

  • multiBamSummary
  • plotCorrelation

Define the best correlation method

order in the rules

When running the snakefile with multiple core, it seems that the indexing of the bam file is done later than the rules using it.
'results/mapped/ChIP1_L1.sorted.rmdup.bam' does not appear to have an index. You MUST index the file first!

'results/mapped/ChIP1_L1.sorted.rmdup.bam' does not appear to have an index. You MUST index the file first!
    Error in rule bamcompare:
        jobid: 3
        output: results/bamcompare/log2_ChIP1_ChIP2_L1.bamcompare.bw

RuleException:
CalledProcessError in line 249 of /Users/Jihed/Desktop/DMC1_ChIPseq/Snakefile:
Command ' set -euo pipefail;  bamCompare -b1 results/mapped/ChIP1_L1.sorted.rmdup.bam -b2 results/mapped/ChIP2_L1.sorted.rmdup.bam -o results/bamcompare/log2_ChIP1_ChIP2_L1.bamcompare.bw ' returned non-zero exit status 1
  File "/Users/Jihed/Desktop/DMC1_ChIPseq/Snakefile", line 249, in __rule_bamcompare
  File "/anaconda3/envs/bigwig/lib/python3.5/concurrent/futures/thread.py", line 55, in run```

The rule producing the bam index file is `rule bam_index`, take sorted bam files as input. The output is required by the `rule_all`, so the file are produced at some point and it appears in the DAG. However I can not find the files `*.bai` in the output folder. 

Some logs are incomplete

Some of the logs produced by the pipeline (branch develop) are empty files:

  • bedgraph.log
  • bamcompare.log
  • macs2 narrowPeak
  • samtools rmdup

Move the definitions of the group for deep tools to the configuration file

On the Deeptools branch:
For now the definition of groups to make the matrix is written in the Snakefile, therefore it is required to change the Snakefile to make.
Before finding a more handy solution to define the group, I should move the group definition to the configuration file and refer to it in the Snakefile

test

to test the automation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.