Giter VIP home page Giter VIP logo

cnr-flow's Introduction

CUT&RUN-Flow (CnR-flow)

GitHub release (latest by date including pre-releases)

Travis-CI Build Status

ReadTheDocs Documentation Status

GNU GPLv3+ License

Zenodo DOI:10.5281/zenodo.4015699

Welcome to CUT&RUN-Flow (CnR-flow), a Nextflow pipeline for QC, tag trimming, normalization, and peak calling for paired-end sequencing data from CUT&RUN experiments.
This software is available via GitHub, at http://www.github.com/RenneLab/CnR-flow .
Full project documentation is available at CUT&RUN-Flow's ReadTheDocs_.
Pipeline Design:
CUT&RUN-Flow is built using Nextflow, a powerful domain-specific workflow language built to create flexible and efficient bioinformatics pipelines. Nextflow provides extensive flexibility in utilizing cluster computing environments such as PBS and SLURM, and in automated and compartmentalized handling of dependencies using Conda / Bioconda and Environment Modules.
Dependencies:
In addition to standard local configurations, Nextflow allows handling of dependencies in separated working environments within the same pipeline using Conda or Environment Modules. CnR-flow is pre-configured to acquire and utilize dependencies using conda environments with no additional required setup.
CUT&RUN-Flow utilizes UCSC Genome Browser Tools and Samtools for reference library preparation, FastQC for tag quality control, Trimmomatic and kseq_test (CUT&RUN-Tools) for tag trimming, Bowtie2 for tag alignment, Samtools, bedtools and UCSC Genome Browser Tools for alignment manipulation, and MACS2 and/or SEACR for peak calling, as well as their associated language subdependencies of Java, Python2/3, R, and C++.
Pipeline Features:
  • One-step reference database prepration using a path (or URL) to a FASTA file.
  • Ability to specify groups of samples containing both treatment (Ex: H3K4me3) and control (Ex: IgG) antibody groups, with automated association of each control sample with the respective treatment samples during the peak calling step
  • Built-in normalization protocol to normalize to a sequence library of the user's choice when spike-in DNA is used in the CUT&RUN Protocol (Optional, includes an E. coli reference genome for utiliziation of E. coli as a spike-in control as described by Meers et. al. (eLife 2019) [see the References section of CUT&RUN-Flow's ReadTheDocs_])
  • SLURM, PBS... and many other job scheduling environments enabled natively by Nextflow
  • Output of memory-efficient CRAM (alignment), bedgraph (genome coverage), and bigWig (genome coverage) file formats

CUT&RUN-Flow Pipe Flowchart

| For a full list of required dependencies and tested versions, see

the Dependencies section of CUT&RUN-Flow's ReadTheDocs_, and for dependency configuration options see the Dependency Configuration section.

Quickstart

Here is a brief introduction on how to install and get started using the pipeline. For full details, see CUT&RUN-Flow's ReadTheDocs_.

Prepare Task Directory:
Create a task directory, and navigate to it.
$ mkdir ./my_task  # (Example)
$ cd ./my_task     # (Example)
Install Nextflow (if necessary):
Download the nextflow executable to your current directory.
(You can move the nextflow executable and add to $PATH for future usage)
$ curl -s https://get.nextflow.io | bash

# For the following steps, use:
nextflow    # If nextflow executable on $PATH (assumed)
./nextflow  # If running nextflow executable from local directory
Download and Install CnR-flow:
Nextflow will download and store the pipeline in the user's Nextflow info directory (Default: ~/.nextflow/)
$ nextflow run RenneLab/CnR-flow --mode initiate    
Configure, Validate, and Test:
If using Nextflow's builtin Conda dependency handling (recommended), install miniconda (if necessary). Installation instructions
The CnR-flow configuration with Conda should then work "out-of-the-box."

If using an alternative configuration, see the Dependency Configuration section of CUT&RUN-Flow's ReadTheDocs_ for dependency configuration options.

Once dependencies have been configured, validate all dependencies:
$ nextflow run CnR-flow --mode validate_all
Fill the required task input parameters in "nextflow.config" For detailed setup instructions, see the Task Setup section of CUT&RUN-Flow's ReadTheDocs_ Additionally, for usage on a SLURM, PBS, or other cluster systems, configure your system executor, time, and memory settings.
# Configure:
$ <vim/nano...> nextflow.config   # Task Input, Steps, etc. Configuration

#REQUIRED values to enter (all others *should* work as default):
# ref_fasta               (or some other ref-mode/location)
# treat_fastqs            (input paired-end fastq[.gz] file paths)
#   [OR fastq_groups]     (mutli-group input paired-end .fastq[.gz] file paths)
Prepare and Execute Pipeline:
Prepare your reference databse (and normalization reference) from .fasta[.gz] file(s):
$ nextflow run CnR-flow --mode prep_fasta
Perform a test run to check inputs, paramater setup, and process execution:
$ nextflow run CnR-flow --mode dry_run
If satisifed with the pipeline setup, execute the pipeline:
$ nextflow run CnR-flow --mode run
Further documentation on CUT&RUN-Flow components, setup, and usage can be found in CUT&RUN-Flow's ReadTheDocs_.

cnr-flow's People

Contributors

dstrib avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.