nf-core / raredisease Goto Github PK

View Code? Open in Web Editor NEW

77.0 151.0 34.0 12.41 MB

Call and score variants from WGS/WES of rare disease patients.

Home Page: https://nf-co.re/raredisease

License: MIT License

HTML 0.64% Python 2.88% Nextflow 96.02% Groovy 0.29% Shell 0.16%

nf-core nextflow workflow pipeline wgs wes variant-calling snv structural-variants variant-annotation

raredisease's Introduction

Introduction

nf-core/raredisease is a best-practice bioinformatic pipeline for calling and scoring variants from WGS/WES data from rare disease patients. This pipeline is heavily inspired by MIP.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

Pipeline summary

1. Metrics:

2. Alignment:

3. Variant calling - SNV:

4. Variant calling - SV:

Manta
TIDDIT's sv
Copy number variant calling:
- CNVnator
- GATK GermlineCNVCaller

5. Annotation - SNV:

6. Annotation - SV:

7. Mitochondrial analysis:

Alignment and variant calling - GATK Mitochondrial short variant discovery pipeline
eKLIPse
Annotation:
- HaploGrep2
- Hmtnote
- vcfanno
- CADD
- VEP

8. Variant calling - repeat expansions:

9. Variant calling - mobile elements:

RetroSeq

10. Rank variants - SV and SNV:

GENMOD

11. Variant evaluation:

RTG Tools

Note that it is possible to include/exclude certain tools or steps.

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,lane,fastq_1,fastq_2,sex,phenotype,paternal_id,maternal_id,case_id
hugelymodelbat,1,reads_1.fastq.gz,reads_2.fastq.gz,1,2,,,justhusky

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

Second, ensure that you have defined the path to reference files and parameters required for the type of analysis that you want to perform. More information about this can be found here.

Now, you can run the pipeline using:

nextflow run nf-core/raredisease \
   -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/raredisease was written in a collaboration between the Clinical Genomics nodes in Sweden, with major contributions from Ramprasad Neethiraj, Anders Jemt, Lucia Pena Perez, and Mei Wu at Clinical Genomics Stockholm.

Additional contributors were Sima Rahimi, Gwenna Breton and Emma Västerviga (Clinical Genomics Gothenburg); Halfdan Rydbeck and Lauri Mesilaakso (Clinical Genomics Linköping); Subazini Thankaswamy Kosalai (Clinical Genomics Örebro); Annick Renevey and Peter Pruisscher (Clinical Genomics Stockholm); Ryan Kennedy (Clinical Genomics Lund); Anders Sune Pedersen (Danish National Genome Center) and Lucas Taniguti.

We thank the nf-core community for their extensive assistance in the development of this pipeline.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #raredisease channel (you can join with this invite).

Citations

If you use nf-core/raredisease for your analysis, please cite it using the following doi: 10.5281/zenodo.7995798

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

You can read more about MIP's use in healthcare in,

Stranneheim H, Lagerstedt-Robinson K, Magnusson M, et al. Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients. Genome Med. 2021;13(1):40. doi:10.1186/s13073-021-00855-5

raredisease's People

Contributors

Stargazers

Watchers

raredisease's Issues

test config: taking too long to run the tests

Description of feature

As shown in the screenshot, it takes roughly 19 minutes to finish the test. I propose that we use the incubating stub feature: https://www.nextflow.io/docs/latest/process.html#stub for workflows that take a while e.g. call_snv_deepvariant.nf.

If we implement, it would involve adding the stub command in each of the processes in the above ^ subworkflow:

deepvariant
glnexus

Create subworkflow to prepare indices

Is your feature request related to a problem? Please describe

The pipeline currently re-builds the index for bwamem2 on every run. In order to save resources, there should be a check if there are existing indices to be used instead.

Describe the solution you'd like

This subworkflow should 1) check for existing reference index files 2) allow for the re-use of indices in different downstream processes.

Describe alternatives you've considered

Additional context

New modules required: Sentieon

Here is a list of Sentieon tools that are relevant for the pipeline and for which issues have been opened in https://github.com/nf-core/modules.

bwa mem nf-core/modules#732
LocusCollector nf-core/modules#733
Dedup nf-core/modules#734
DNAscope assigned to @Gwennid nf-core/modules#715
TNscope assigned to @Gwennid nf-core/modules#731
DNAModelApply nf-core/modules#735

Another tool that might be relevant but for which there is no open issue at the moment:

WgsMetricsAlgo

Update the way module versions are emitted

Is your feature request related to a problem? Please describe

nf-core/modules updated the way versions are emitted, so from <software>.version.txt -> versions.yml. This allows to emit multiple versions in cases a module or subworkflow uses multiple tools. Updated documentation here.

This pipeline is not updated accordingly yet.

Describe the solution you'd like

Update the subworkflows and main workflow 😄 accordingly

Describe alternatives you've considered

Additional context

VEP

Is your feature request related to a problem? Please describe

Add VEP from nf-core modules

Describe the solution you'd like

Describe alternatives you've considered

Additional context

select mitochondrial reads with `samtools view`; samtools

SamtoFastQ

Convert SAM or BAM file to FastQ

add bwamem2

How-to update the pipeline flowchart

Description of feature

We have a in-progress flowchart for the pipeline: https://docs.google.com/drawings/d/1QZsgxM4zuArI-N2kuWzwJB5PpjwOu8XxJ3-DInlNYEk/edit

To encourage everyone to contribute to it, we should write a "how-to".

Add Deepvariant

Use the nf-core module if possible DV

Add Picardtools multiple metrics

Is your feature request related to a problem? Please describe

https://github.com/nf-core/modules/tree/master/modules/picard/collectmultiplemetrics

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Glnexus

Add glnexus for genotyping
nf-core/modules#729

Check and create tbi index for bed inputs

Description of feature

Create a subworkflow to check if proper indices are available for bed files and create them when not.

Add Vcfanno to nf-core modules

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

add tiddit/sv

Description of feature

In MIP, we combine callsets using svdb from manta, tiddit/sv, and cnvnator. We should add tiddit/sv.

Bcftools norm

Is your feature request related to a problem? Please describe

Normalize and split multi allelic variants using bcftools norm prior to annotation

Describe the solution you'd like

Incorporate the bcftools norm module from nf-core modules.

Describe alternatives you've considered

We could use vt decompose and normalize

Additional context

update current module versions

Is your feature request related to a problem? Please describe

Samtools and MultiQC are outdated.

Describe the solution you'd like

Update them versions 😃

Describe alternatives you've considered

Additional context

Add TIDDIT/cov module

Our in-house pipeline (MIP) uses this tool and we want to add this to the nextflow pipeline. It's not part of nf-core modules yet: nf-core/modules#792.

Once the module is added to nf-core/modules, then it'll be added to subworkflow qc_bam.nf

Add test dataset to enable CI tests

Is your feature request related to a problem? Please describe

Add test dataset to enable CI tests. New branch need to be created in nf-core/test-datasets

Describe the solution you'd like

Reuse test dataset from Sarek or similiar.

Describe alternatives you've considered

Additional context

Add svdb/query to nf-core/modules

Description of feature

We need the SV querying module to annotate 👍 https://github.com/J35P312/SVDB#query

PIPELINE: write tests for local modules + subworkflows

Description of feature

We should have a tests folder with test workflows for local modules + subworkflows 😄 . Not for nf-core/modules because tests are already written for those in their repo 👍

Add Stranger to nf-core/modules

Description of feature

https://github.com/Clinical-Genomics/stranger

refactor alignmnt modules

Description of feature

Currently, the code snippet is in the raredisease.nf script but when there are more mappers/aligners in the picture we should hide the logic away in a bigger subworkflow with switches for which tool to use. This way we can declutter the raredisease.nf script.

if (params.aligner == 'bwamem2') {
        ALIGN_BWAMEM2 (
            INPUT_CHECK.out.reads,
            PREPARE_GENOME.out.bwamem2_index
        )
...

turns into...

if (aligner == 'bwamem2') {
        ALIGN_BWAMEM2 (
            reads,
            bwamem2_index
        )
...

stowed in the bigger subworkflow 👍 - where aligner is a resource defined in the take: definition block.

Mitochondria workflow

We have agreed to use the mitochondria workflow currently implemented at GATK best practices.

The following steps are included. Modules already exist for some of them; all modules need to be included in a subworkflow. We plan to have the mitochondrial subworkflow run by default, but to have the possibility to turn it off and also to turn off the calling of variants for the autosomes.

This list can be modified as new issues are created and new modules are added.

Test dataset including mtDNA: https://github.com/nf-core/test-datasets/tree/raredisease

add manta to SV caller subworkflow

Add manta to SV caller subworkflow

Adding read groups to meta

It would be good to add read_group to meta so bwa_mem2 can use it and other future programs too (e.g. peddy needs it).

I have tested a little bit the addition of line:
meta.read_group = "'@rg\tID:"+row.sample + "" + row.fastq_1.split('/')[-1].split('R1*.fastq')[0] + "" + row.lane + "\tPL:ILLUMINA\tSM:"+row.sample.split('')[0]+"'"
in subworkflows/local/input_check.nf
But it creates issues when GLnexus needs to combine the different channels again (see nextflow log)
This problem does however not arise with
meta.read_group = "'@rg\tID:myid\tPL:ILLUMINA\tSM:"+row.sample.split('_')[0]+"'"

The problem arises with both a unique sample or multiple samples in the samplesheet.

Add svdb/merge to pipeline

Description of feature

Add this to call_structural_variants.nf to combine VCFs from manta, cnvpytor, tiddit

Add vcfanno to the annotation workflow

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Add Expansionhunter

Is your feature request related to a problem? Please describe

Add Expansionhunter nf-core module:
https://github.com/nf-core/modules/tree/master/modules/expansionhunter

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Create an interactive chart to use during pipeline development

To complement the project board, we would like to have an interactive chart that can reflect the progresses of the development work and that can be easily modified e.g. when we want to include more tools.
nf-core recommends LucidChart or Google Drawings for such task. For the moment we are going for Google Drawings.

Update mosdepth in nf-core modules

Description of feature

Update mosdepth in nf-core modules to v3.3.3

Add Stranger to pipeline

Description of feature

Should Stranger be added to subworkflows/nf-core/call_repeat_expansions.nf subworkflow?

refactor check vcf subworkflow

Description of feature

Refactor check vcf subworkflow

RevertSam-GATK

RevertSam

Produce unmapped BAM (uBAM) from aligned BAM

Parse input vcf to check for normalization

Is your feature request related to a problem? Please describe

We need to know that the input vcf:s used in for example the annotation process have been decomposed.

Describe the solution you'd like

Write a small script that parses the header and checks for bcftools norm command.

Describe alternatives you've considered

Additional context

UPDATE: nf-core template files with new dsl2 syntax

An example of what new syntax looks like: https://github.com/nf-core/rnaseq/blob/dsl2/conf/modules.config

Rationale for template update on module + pipeline level: nf-core/tools#1327

Add bcftools/annotate

Description of feature

Hello 👋 , we use this to add additional annotations after vcfanno and include a header relating to software and case info

Draft overview of future pipeline

This overview is based on the WGS/WES rare disease pipeline (MIP) that is currently in use at Clinical Genomics Stockholm. This outlines the basic functionality and modules that we would like to have from a pipeline specialised for calling, annotating and scoring variants relevant for rare disease patients.

Overview

Fastq files are prepared for variantcalling by alignment with bwa-mem/bwa-mem2 followed by markduplicates. From this point the workflow is split into a SNV/indel part and a SV part.

SNV/Indels

SNV/indels are primarly called with Deepvariant Glnexus but with the possibility of turning on the GATK Haplotypecaller workflow. These two callsets can be combined into one for maximum sensitivity. Vcfanno annotates the callset with population allele frequencies (Gnomad) and predicted pathogenicity (CADD). Common variation is removed from the callset and CADD scores are caclulated for indels. VEP is used for transcript annotation including annotation with CLINVAR, SpliceAI and pLI scores. The SNV/indels are split into a clinical callset and a research callset based on a bed file with genes of interest. Finally the variants are ranked for predicted pathogenicity based on their annotation as well as their modes of inheritance.

SV

We use Cnvnator, Manta, (Delly) and Tiddit to call structural variants. Using SVDB we combine the variants into one callset and using a local frequency database we remove common variants and sequencing/calling artefacts. The callset is annotated with vcfanno and VEP followed by a split into a clincal callset and a research callset. The SVs are then ranked in the same manner as the SNVs.

But wait there's more

Aside from SNVs and SVs the pipelines identifies and visualizes runs of homo/auto-zygosity as well as upd:s. Also included are identification and annotation of pathogenic STRs with ExpansionHunter and Stranger. SMNCopyNumberCaller is used to diagnose patients with spinal muscular atrophy.

The tools mentioned here are not set in stone and we are certainly open to adding and changing tools as we continue development. Below is a list of tools used in the workflow.

Bcftools
BedTools
BWA
CADD
Chanjo
Chromograph
Cnvnator
Cyrius
Delly
Deepvariant
Expansionhunter
FastQC
GATK
GENMOD
Gffcompare
Glnexus
Manta
MultiQC
Peddy
PicardTools
PLINK
Rhocall
Sambamba
Samtools
SMNCopyNumberCaller
Stranger
Svdb
Telomerecat
Tiddit
Upd
Vcf2cytosure
Vcfanno
VEP

Add feature/MarkDuplicates

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Add this finishing touch to the mapping subworkflow so that the preprocessing of bam files is complete before branching into other tools e.g. variant callers

Describe alternatives you've considered

Additional context

Restructure workflow to use "when"

Description of feature

It would be nice to use this!

What needs to happen:

update nfcore modules: #100
update local modules: #101
modify modules.config starting with align subwkflw: #105

bcftools/sort module

bcftools/sort is used in the mitochondrial workflow. An issue is open here: nf-core/modules#915.

cnvpytor/importreaddepth module

An issue has been opened on nf-core/modules#1092

Add mosdepth to pipeline

Description of feature

Add mosdepth to raredisease

Include a default variant catalog file

Maybe a default file for variant_catalog (in case the user doesn't provide one) should still be added? What do you think? In case this should be included in this merge, I can try to look for how this could be done in prepare_genome.nf.

It's not a bad idea. However I think we can go ahead and merge this one and add that option in a small PR later. We could bundle it with the pipeline or have it as an url https://raw.githubusercontent.com/Illumina/ExpansionHunter/master/variant_catalog/hg19/variant_catalog.json
There has also been a discussion about adding a download workflow which would automatically download all the references.

Originally posted by @jemten in #51 (comment)

Add picardtools collecthsmetrics to BamQC subworkflow

~~Our in-house pipeline uses this tool and we want to add this to the nextflow pipeline. It's not part of nf-core modules yet: nf-core/modules#793.~~

~~Once the module is added to nf-core/modules, then it'll be added to subworkflow qc_bam.nf~~

EDIT: The module is part of nf-core/modules now. Please go ahead and add it to subworkflow qc_bam.nf

Add svdb/merge to nf-core/modules

Description of feature

We use this to aggregate SV vcf callset. https://github.com/J35P312/SVDB

Java memory issue on SLURM

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

nextflow run nf-core/raredisease -profile test,singularity,hasta,dev_prio -r dev (-c customconf.conf )
See error:

Without customconf.conf

[dd/f36687] NOTE: Process `NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)` terminated with an error exit status (134) -- Execution is retried (1)
WARN: Input tuple does not match input set cardinality declared by process `NFCORE_RAREDISEASE:RAREDISEASE:DEEPVARIANT_CALLER:GLNEXUS` -- offending value: [id:caseydonkey]
Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)` terminated with an error exit status (134)

Command executed:

  picard \
      -Xmx6g \
      MarkDuplicates \
      --CREATE_INDEX \
      -I 1234N.bam \
      -O 1234N_sorted.bam \
      -M 1234N_sorted.MarkDuplicates.metrics.txt

  cat <<-END_VERSIONS > versions.yml
  MARKDUPLICATES:
      markduplicates: $(echo $(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
  END_VERSIONS

Command exit status:
  134

Command output:
  #
  # A fatal error has been detected by the Java Runtime Environment:
  #
  #  Internal Error (g1PageBasedVirtualSpace.cpp:43), pid=211157, tid=211219
  #  guarantee(rs.is_reserved()) failed: Given reserved space must have been reserved already.
  #
  # JRE version:  (11.0.9.1) (build )
  # Java VM: OpenJDK 64-Bit Server VM (11.0.9.1-internal+0-adhoc..src, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
  # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
  #
  # An error report file with more information is saved as:
  # hs_err_pid211157.log
  #
  #

Command error:
  /usr/local/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory
  /usr/local/bin/picard: line 66: 211157 Aborted                 /usr/local/bin/java -Xmx6g -jar /usr/local/share/picard-2.25.7-0/picard.jar MarkDuplicates "--CREATE_INDEX" "-I" "1234N.bam" "-O" "1234N_sorted.bam" "-M" "1234N_sorted.MarkDuplicates.metrics.txt"

With customconf.conf:

process {
    withName: PICARD_MARKDUPLICATES {
        memory = 5.GB
    }
}

Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:ALIGN_BWAMEM2:MARKDUPLICATES (1234N)` terminated with an error exit status (1)

Command executed:

  picard \
      -Xmx5g \
      MarkDuplicates \
      --CREATE_INDEX \
      -I 1234N.bam \
      -O 1234N_sorted.bam \
      -M 1234N_sorted.MarkDuplicates.metrics.txt

  cat <<-END_VERSIONS > versions.yml
  MARKDUPLICATES:
      markduplicates: $(echo $(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
  END_VERSIONS

Command exit status:
  1

Command output:
  Error occurred during initialization of VM
  Could not reserve enough space for 5242880KB object heap

Command error:
  /usr/local/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory

Expected behaviour

Successful completion of the analysis

Log files

Have you provided the following extra information/files:

The command used to run the pipeline
The .nextflow.log file

System

Hardware: HPC, hasta
Executor: slurm
OS: CentOS
Version: 7

Nextflow Installation

Version: 21.04.3.5560

Container engine

Engine: singularity
version: 3.1.1-1.el7

Quick fix that solves the problem until more elegant solution

modules/nf-core/modules/picard/markduplicates/main.nf:
avail_mem = task.memory.giga-2

Next related issue

Similar error for bamqc.

Additional context

For the first error, markduplicates:

nextflow-customconf.log
nextflow-no-customconf.log

Add VEP to annotate SV subworkflow

Description of feature

Some relevant arguments are --max_sv_size <length of chromosme 1>and the ExACpLI plugin

Add svdb/query to annotate SV subworkflow

Description of feature

Add this to call_structural_variants.nf so we can annotate combined call sets

Gens preprocessing

Description of feature

Add preprocessing for Gens to the pipeline.

GATK CollectReadCounts added to nfcore/modules
GATK DenoiseReadCounts added to nfcore/modules
Gens perl-scripts added as a local module
Local subworkflow added
Subworkflow added to main workflow

nf-core / raredisease Goto Github PK

raredisease's Introduction

TOC

Introduction

Pipeline summary

Usage

Pipeline output

Credits

Contributions and Support

Citations

raredisease's People

Contributors

Stargazers

Watchers

Forkers

raredisease's Issues

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Description of feature

Description of feature

RevertSam

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Description of feature

Overview

SNV/Indels

SV

But wait there's more

Is your feature request related to a problem? Please describe

Describe the solution you'd like