This Snakemake workflow provides a comprehensive pipeline for processing Illumina sequencing data, from quality control through to consensus sequence generation and annotation.
- Quality Control: Assess the quality of raw FASTQ files using FastQC.
- MultiQC Report Generation: Compile FastQC reports into a single MultiQC report for easy visualization.
- Read Trimming: Trim adapters and low-quality bases from reads using Trimmomatic.
- Read Alignment: Align trimmed reads to a reference genome using BWA.
- Alignment Sorting and Indexing: Process-aligned reads with SAMtools to sort and index BAM files.
- BAM File Statistics: Generate statistics for BAM files using SAMtools.
- Variant Calling: Call variants using bcftools.
- Consensus Sequence Generation: Generate a consensus sequence from the variant calls.
- Quality Assessment of Consensus Sequence: Evaluate the consensus sequence quality with QUAST.
- Annotation: Annotate the consensus sequence using Prokka.
This workflow requires the following tools, managed via a Conda environment:
- FastQC
- MultiQC
- Trimmomatic
- BWA
- SAMtools
- bcftools
- QualiMap (optional in the provided script, replace with your needs)
- QUAST
- Prokka
- Snakemake
- Ensure Conda is installed on your system.
- Create a Conda environment using the provided
environment.yaml
file:
conda env create -f environment.yaml
Activate the Conda environment:
conda activate bioinformatics_workflow
Configuration Modify the config.yaml file to list your samples and specify the path to the reference genome. Place your raw FASTQ files in the designated directory as outlined in the config.yaml. Running the Workflow With the Conda environment activated and the configuration set, run the workflow using the following command in the directory containing your Snakefile:
snakemake --cores all
Output The workflow will generate the following outputs in the designated directories:
Quality control reports in fastqc/ and multiqc/ Trimmed reads in trimmed/ Aligned reads and statistics in alignment/ and qc/ Variant calling and consensus sequences in vcf/ and consensus/ Annotation results in prokka/