rennelab / cnr-flow Goto Github PK

CUT&RUN-Flow, A Nextflow pipeline for QC, tag trimming, normalization, and peak calling for data from CUT&RUN experiments.

License: GNU General Public License v3.0

Nextflow 75.86% Python 8.85% Shell 7.23% TeX 8.06%

nextflow cutandrun-seq genomics peak-calling cutandrun chip-seq-pipelines cutrun

cnr-flow's Introduction

CUT&RUN-Flow (CnR-flow)

GitHub release (latest by date including pre-releases)

Welcome to CUT&RUN-Flow (CnR-flow), a Nextflow pipeline for QC, tag trimming, normalization, and peak calling for paired-end sequencing data from CUT&RUN experiments.

This software is available via GitHub, at http://www.github.com/RenneLab/CnR-flow .

Full project documentation is available at CUT&RUN-Flow's ReadTheDocs.

Pipeline Design:

CUT&RUN-Flow is built using Nextflow, a powerful domain-specific workflow language built to create flexible and efficient bioinformatics pipelines. Nextflow provides extensive flexibility in utilizing cluster computing environments such as PBS and SLURM, and in automated and compartmentalized handling of dependencies using Conda / Bioconda, Docker, Singularity or Environment Modules.

Dependencies:

In addition to local configurations, Nextflow handles dependencies in separated working environments within the same pipeline using Conda or Environment Modules within your working environment, or using container-encapsulated execution with Docker or Singularity. CnR-flow is pre-configured to auto-acquire dependencies with no additional setup, either using Conda recipes from the Bioconda project, or by using Docker or Singularity to execute Docker images hosted by the BioContainers project (Bioconda; BioContainers).

CUT&RUN-Flow utilizes UCSC Genome Browser Tools and Samtools for reference library preparation, FastQC for tag quality control, Trimmomatic for tag trimming, Bowtie2 for tag alignment, Samtools, bedtools and UCSC Genome Browser Tools for alignment manipulation, and MACS2 and/or SEACR for peak calling, as well as their associated language subdependencies of Java, Python2/3, R, and C++.

Pipeline Features:

One-step reference database prepration using a path (or URL) to a FASTA file.
Ability to specify groups of samples containing both treatment (Ex: H3K4me3) and control (Ex: IgG) antibody groups, with automated association of each control sample with the respective treatment samples during the peak calling step
Built-in normalization protocol to normalize to a sequence library of the user's choice when spike-in DNA is used in the CUT&RUN Protocol (Optional, includes an E. coli reference genome for utiliziation of E. coli as a spike-in control as described by Meers et. al. (eLife 2019) [see the References section of CUT&RUN-Flow's ReadTheDocs])
OR: CPM-normalization to normalize total read counts between samples (beta).
SLURM, PBS... and many other job scheduling environments enabled natively by Nextflow
Output of memory-efficient CRAM (alignment), bedgraph (genome coverage), and bigWig (genome coverage) file formats

For a full list of required dependencies and tested versions, see the Dependencies section of CUT&RUN-Flow's ReadTheDocs, and for dependency configuration options see the Dependency Configuration section.

Quickstart

Here is a brief introduction on how to install and get started using the pipeline. For full details, see CUT&RUN-Flow's ReadTheDocs.
Prepare Task Directory:
Create a task directory, and navigate to it.
$ mkdir ./my_task  # (Example)
$ cd ./my_task     # (Example)
Install Nextflow (if necessary):
Download the nextflow executable to your current directory.

(You can move the nextflow executable and add to $PATH for future usage)
$ curl -s https://get.nextflow.io | bash

# For the following steps, use:
nextflow    # If nextflow executable on $PATH (assumed)
./nextflow  # If running nextflow executable from local directory
Download and Install CnR-flow:
Nextflow will download and store the pipeline in the user's Nextflow info directory (Default: ~/.nextflow/)
$ nextflow run RenneLab/CnR-flow --mode initiate
Configure, Validate, and Test:
Conda:

Install miniconda (if necessary). Installation instructions

The CnR-flow configuration with Conda should then work "out-of-the-box."

Docker:

Add '-profile docker' to all nextflow commands

Singularity:

Add '-profile singularity' to all nextflow commands

If using an alternative configuration, see the Dependency Configuration section of CUT&RUN-Flow's ReadTheDocs for dependency configuration options.

Once dependencies have been configured, validate all dependencies:
# Conda or other configs:
$ nextflow run CnR-flow --mode validate_all

# OR Docker Configuration:
$ nextflow run CnR-flow -profile docker --mode validate_all

# OR Singularity Configuration:
$ nextflow run CnR-flow -profile singularity --mode validate_all
Fill the required task input parameters in "nextflow.config" For detailed setup instructions, see the Task Setup section of CUT&RUN-Flow's ReadTheDocs Additionally, for usage on a SLURM, PBS, or other cluster systems, configure your system executor, time, and memory settings.
# Configure:
$ <vim/nano...> nextflow.config   # Task Input, Steps, etc. Configuration

#REQUIRED values to enter (all others *should* work as default):
# ref_fasta               (or some other ref-mode/location)
# treat_fastqs            (input paired-end fastq[.gz] file paths)
#   [OR fastq_groups]     (mutli-group input paired-end .fastq[.gz] file paths)
Prepare and Execute Pipeline:
Prepare your reference databse (and normalization reference) from .fasta[.gz] file(s):
$ nextflow run CnR-flow --mode prep_fasta
Perform a test run to check inputs, paramater setup, and process execution:
$ nextflow run CnR-flow --mode dry_run
If satisifed with the pipeline setup, execute the pipeline:
$ nextflow run CnR-flow --mode run
Further documentation on CUT&RUN-Flow components, setup, and usage can be found in CUT&RUN-Flow's ReadTheDocs.

cnr-flow's People

Contributors

Stargazers

Watchers

Forkers

biobit dklcdbi isaryhia-r catherinezml

cnr-flow's Issues

error at 'CnR_S2_A_Aln_Ref (treat_1)'

Hello again, when I run for one sample as test with treat file I get this error when the pipeline arrive to the alignment step :

running same command in a local environnement (with Bowtie2/2.4.2) works perfectly. I tryied with several docker images of Bowtie2 and always got same error.
Thanks in advance!

Unknown variable 'R2_Files' -- Make sure it is not misspelt and defined somewhere in the script before using it

hello, when I use CnR-flow for one sample with treat file but no ctrl file. It detected the input,but said Unknown variable 'R2_Files'.
My command is :
nextflow run CnR-flow --mode run
The output is:
"N E X T F L O W ~ version 20.10.0
Launching RenneLab/CnR-flow [golden_euler] - revision: aa7cf33 [master]

Utilizing Run Mode: run

-- Preparing Workflow Environment --

2 Input Files Detected.

-- Executing Workflow --

[- ] process > CnR_S0_A_GetSeqLen -
[- ] process > CnR_S0_B_MergeFastqs -
[- ] process > CnR_S0_C_FastQCPre -
[- ] process > CnR_S1_A_Trim -
[- ] process > CnR_S1_B_Retrim -
[- ] process > CnR_S1_C_FastQCPost -
[- ] process > CnR_S2_A_Aln_Ref -
[- ] process > CnR_S2_B_Modify_Aln -
Error executing process > 'CnR_S0_B_MergeFastqs (Aged_HSC_CTCF_10)'

Caused by:
Unknown variable 'R2_Files' -- Make sure it is not misspelt and defined somewhere in the script before using it

Source block:
run_id = "${task.tag}.${task.process}"
out_log_name = "${run_id}.nf.log.txt"
merge_fastqs_dir = "${params.merge_fastqs_dir}"
R1_files = fastq.findAll {fn ->
"${fn}".contains("R1") || "${fn}".contains("_1.f") || "${fn}".contains("1")
}
R2_files = fastq.findAll {fn ->
"${fn}".contains("R2") || "${fn}".contains("_2.f") || "${fn}".contains("2")
}
R1_out_file = "${params.merge_fastqs_dir}/${name}_R1_001.fastq.gz"
R2_out_file = "${params.merge_fastqs_dir}/${name}_R2_001.fastq.gz"
if( R1_files.size() < 1 || R2_files.size() < 1 ) {
message = "Merge Error:\nR1 Files: ${R1_files}\nR2 Files: ${R2_Files}"
throw new Exception(message)
}
check_command = ""
if( "${R1_files[0]}".endsWith('.gz') ) {
check_command = "gzip -vt ${R1_files.join(' ')} ${R2_files.join(' ')}"
}
if( R1_files.size() == 1 && R2_files.size() == 1 ) {
command = '''
echo "No Merge Necessary. Renaming Files..."
mkdir !{merge_fastqs_dir}

              # Check File Integrity if gzipped
              !{check_command}

              set -v -H -o history
              mv -v "!{R1_files[0]}" "!{R1_out_file}"
              mv -v "!{R2_files[0]}" "!{R2_out_file}"
              set +v +H +o history
              R1_OUT_LEN=$(zcat -f !{R1_out_file} | wc -l)
              R2_OUT_LEN=$(zcat -f !{R2_out_file} | wc -l)
              echo "R1 Lines: ${R1_OUT_LEN}"
              echo "R2 Lines: ${R2_OUT_LEN}"
              if [ "${R1_OUT_LEN}" == "0" || "${R2_OUT_LEN}" == "0" ]; then
                  echo "Input file of zero length detected."
                  exit 1
              fi
              '''
          } else {
              command = '''
              mkdir !{merge_fastqs_dir}

[- ] process > CnR_S0_A_GetSeqLen [ 0%] 0 of 1
[- ] process > CnR_S0_B_MergeFastqs -
[- ] process > CnR_S0_C_FastQCPre -
[- ] process > CnR_S1_A_Trim -
[- ] process > CnR_S1_B_Retrim -
[- ] process > CnR_S1_C_FastQCPost -
[- ] process > CnR_S2_A_Aln_Ref -
[- ] process > CnR_S2_B_Modify_Aln -
[- ] process > CnR_S2_C_Make_Bdg -
[- ] process > CnR_S3_A_Aln_Spike -
[- ] process > CnR_S3_B_Norm_Bdg -
[- ] process > CnR_S4_A_Make_BigWig -
[- ] process > CnR_S5_A_Peaks_MACS -
[- ] process > CnR_S5_B_Peaks_SEACR -
Error executing process > 'CnR_S0_B_MergeFastqs (Aged_HSC_CTCF_10)'

Caused by:
Unknown variable 'R2_Files' -- Make sure it is not misspelt and defined somewhere in the script before using it

              # Check File Integrity if gzipped
              !{check_command}

              set -v -H -o history
              mv -v "!{R1_files[0]}" "!{R1_out_file}"
              mv -v "!{R2_files[0]}" "!{R2_out_file}"
              set +v +H +o history
              R1_OUT_LEN=$(zcat -f !{R1_out_file} | wc -l)
              R2_OUT_LEN=$(zcat -f !{R2_out_file} | wc -l)
              echo "R1 Lines: ${R1_OUT_LEN}"
              echo "R2 Lines: ${R2_OUT_LEN}"
              if [ "${R1_OUT_LEN}" == "0" || "${R2_OUT_LEN}" == "0" ]; then
                  echo "Input file of zero length detected."
                  exit 1
              fi
              '''
          } else {
              command = '''
              mkdir !{merge_fastqs_dir}
              mkdir !{merge_fastqs_dir}

              # Check File Integrity if gzipped
              !{check_command}

              echo -e "\\nCombining Files: !{R1_files.join(' ')}"
              echo "    Into: !{R1_out_file}"
              set -v -H -o history
              cat '!{R1_files.join("' '")}' > '!{R1_out_file}'
              set +v +H +o history

              echo -e "\\nCombining Files: !{R2_files.join(' ')}"
              echo "    Into: !{R2_out_file}"
              set -v -H -o history
              cat '!{R2_files.join("' '")}' > '!{R2_out_file}'
              set +v +H +o history

              R1_OUT_LEN=$(zcat -f !{R1_out_file} | wc -l)
              R2_OUT_LEN=$(zcat -f !{R2_out_file} | wc -l)
              echo "R1 Lines: ${R1_OUT_LEN}"
              echo "R2 Lines: ${R2_OUT_LEN}"
              if [ "${R1_OUT_LEN}" == "0" || "${R2_OUT_LEN}" == "0" ]; then
                  echo "Input file of zero length detected."
                  exit 1
              fi
              '''
          }

command

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line
"
So what's wrong?
Thanks.

Question on CnR run

Hello,

Thanks for making this pipeline, I'm excited to give it a try.
After set up the genome fa and fastq file paths, I tried prep_fasta and list_refs , they all worked well, but both dry_run and run seem not to work. Here is it prints:

N E X T F L O W ~ version 20.10.0
Launching RenneLab/CnR-flow [jolly_elion] - revision: fb398b9301 [master]
Utilizing Run Mode: dry_run
-- Preparing Workflow Environment --
74 Input Files Detected.
-- Executing Workflow --

N E X T F L O W ~ version 20.10.0
Launching RenneLab/CnR-flow [special_ride] - revision: fb398b9301 [master]
Utilizing Run Mode: run
-- Preparing Workflow Environment --
74 Input Files Detected.
-- Executing Workflow --

Could you please provide some guidance on where to look for the problem? Thank you!

Best,
Ellie

Setting-up config file for validate_all

Hi,

Many thanks for creating such a wonderful workflow for cut and run analysis. I am excited to try this workflow, this is my first attempt of using NextFlow, so perhaps I am missing some steps that are quite obvious. Any pointers would be most appreicated

I am encountering an error while running nextflow run CnR-flow --mode validate_all

My nextflow.config file (.nextflow/assets/RenneLab/CnR-flow/nextflow.config) looks like below.

params {
    // ------- Dependency Configuration --------
    // Configuration using conda is recommended for most systems.
    // Each dependency can only have one type of resource configured:
    // (Ex: bowtie2_module OR bowtie2_conda)

    // Dependency Configuration Using Anaconda
    // Miniconda Install Instructions:
    //     https://docs.conda.io/en/latest/miniconda.html
    //
    // -- External Conda Environments:
    facount_conda          = 'bioconda::ucsc-facount=366'
    bowtie2_conda          = 'bioconda::bowtie2=2.4.1'
    fastqc_conda           = 'bioconda::fastqc=0.11.9'
    trimmomatic_conda      = 'bioconda::trimmomatic=0.39'
    kseqtest_conda         = 'RenneLab::cutruntools-exec'
    bedtools_conda         = 'bioconda::bedtools=2.29.2'
    macs2_conda            = 'bioconda::macs2=2.2.6'
    R_conda                = 'r=3.6.0'
    seacr_conda            = "${params.R_conda} ${params.bedtools_conda}"
    samtools_conda         = 'bioconda::samtools=1.9'
    bedgraphtobigwig_conda = 'conda-forge::libpng conda-forge::libuuid conda-forge::mysql-connector-c conda-forge::openssl conda-forge::zlib bioconda::ucsc-bedgraphtobigwig=377'

My guess is I did not have bowtie2 installed in the conda? So I tried to install it with conda install -c bioconda bowtie2=2.4.1. Checking if the bowtie2 package is available with conda list bowtie2 gave me the result ;

# packages in environment at /home/zfadlullah/anaconda3:
#
# Name                    Version                   Build  Channel
bowtie2                   2.4.1            py38he513fc3_0    bioconda

I think bowtie2 installed, so I am not sure why I am getting the error.

Best,
Zaki

Running RenneLab/CnR-flow in mode validate: call process CnR_Validate (bowtie2) singularity image - string parsing error #127

Firstly, I’d like to thank you for this amazing work. I’m trying to apply the pipeline to analyze ChipSeq data. When attempting to run the pipeline in “validate mode” then calling the downloaded images of the system (docker images predefined in “nextflow.config”) with singularity containers, the 'bowtie2' image gives a container string parsing error with exit status (127). When I replaced the version with another updated one, I got the same error with “Samtool” then with “bedgraphtobigwig” and other docker images.

appreciate your time and effort
Many thanks.

System information:
nextflow version : 22.04.5.5708
HPC
Local and Slurm executor were tested
Singularity: 3.8.7-1.el7

example file a misplaced comma

Just like to point out a misplaced comma in example file to use fastq_groups:

Example:
// fastq_groups = [
// 'group_1_name': ['treat': 'relpath/to/treat1*R{1,2}*',
// 'ctrl': 'relpath/to/ctrl1*R{1,2}*',
// ]
// 'group_2_name': ['treat': ['relpath/to/g2_treat1*R{1,2}*'
// '/abs/path/to/g2_treat2*R{1,2}*'
// ],
// 'ctrl': 'relpath/to/g2_ctrl1*R{1,2}*'
// ]
// ]

It runs well for multiple groups after moving the comma to the outside of "]"

Example:
// fastq_groups = [
// 'group_1_name': ['treat': 'relpath/to/treat1*R{1,2}*',
// 'ctrl': 'relpath/to/ctrl1*R{1,2}*'
// ],
// 'group_2_name': ['treat': ['relpath/to/g2_treat1*R{1,2}*'
// '/abs/path/to/g2_treat2*R{1,2}*'
// ],
// 'ctrl': 'relpath/to/g2_ctrl1*R{1,2}*'
// ]
// ]
Thanks!

rennelab / cnr-flow Goto Github PK

cnr-flow's Introduction

CUT&RUN-Flow (CnR-flow)

Quickstart

cnr-flow's People

Contributors

Stargazers

Watchers

Forkers

cnr-flow's Issues

error at 'CnR_S2_A_Aln_Ref (treat_1)'

Unknown variable 'R2_Files' -- Make sure it is not misspelt and defined somewhere in the script before using it

Question on CnR run

Setting-up config file for validate_all

Running RenneLab/CnR-flow in mode validate: call process CnR_Validate (bowtie2) singularity image - string parsing error #127

example file a misplaced comma

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent