hsrishi/nf-bactvar is A nextflow pipeline for variant calling on bacterial genomes.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
TODO: Add pipeline graphic
The core pipeline is composed of the following steps (design in progress):
- Trim and Quality Control reads (
trimmomatic
) - Library Composition Check (
FastQ Screen
) - Subsample reads (Optional;
seqtk
) - Map reads to reference (
bwa-mem
)- Support many-to-many mapping between samples and references
- Reference indexes should be premade to reduce pipeline run time
- Convert SAM to BAM, Sort, Index (
samtools
) - Calculate Genome Coverage (TBD:
samtools
,BEDTools
,picard
, orqualimap
) - Mark duplicates (
picard
) - Variant calling (TBD:
Mutect2
,BCFTools
,FreeBayes
, orDeepVariant
) - Variant annotation (TBD:
SnpEff
,VEP
) - Variant report (Custom
python
script)- Aggregated report across all samples output as a multi-tab
Excel
report with aggregate stats, variant-level information with annotations and confidence levels, and custom filters/views
- Aggregated report across all samples output as a multi-tab
The pipeline also supports QC for outputs from Trimming, Alignment, and Variant Calling as a collated MultiQC
report.
This cookiecutter template is based off of the nf-core
template. You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.