PatoMatic

Bioinformatics Workflow for Illumina Data Analysis

This Snakemake workflow provides a comprehensive pipeline for processing Illumina sequencing data, from quality control through to consensus sequence generation and annotation.

Workflow Steps

Quality Control: Assess the quality of raw FASTQ files using FastQC.
MultiQC Report Generation: Compile FastQC reports into a single MultiQC report for easy visualization.
Read Trimming: Trim adapters and low-quality bases from reads using Trimmomatic.
Read Alignment: Align trimmed reads to a reference genome using BWA.
Alignment Sorting and Indexing: Process-aligned reads with SAMtools to sort and index BAM files.
BAM File Statistics: Generate statistics for BAM files using SAMtools.
Variant Calling: Call variants using bcftools.
Consensus Sequence Generation: Generate a consensus sequence from the variant calls.
Quality Assessment of Consensus Sequence: Evaluate the consensus sequence quality with QUAST.
Annotation: Annotate the consensus sequence using Prokka.

Dependencies

This workflow requires the following tools, managed via a Conda environment:

FastQC
MultiQC
Trimmomatic
BWA
SAMtools
bcftools
QualiMap (optional in the provided script, replace with your needs)
QUAST
Prokka
Snakemake

Setup

Create the Conda Environment

Ensure Conda is installed on your system.
Create a Conda environment using the provided environment.yaml file:

conda env create -f environment.yaml

Activate the Conda environment:

conda activate bioinformatics_workflow

Configuration Modify the config.yaml file to list your samples and specify the path to the reference genome. Place your raw FASTQ files in the designated directory as outlined in the config.yaml. Running the Workflow With the Conda environment activated and the configuration set, run the workflow using the following command in the directory containing your Snakefile:

snakemake --cores all

Output The workflow will generate the following outputs in the designated directories:

Quality control reports in fastqc/ and multiqc/ Trimmed reads in trimmed/ Aligned reads and statistics in alignment/ and qc/ Variant calling and consensus sequences in vcf/ and consensus/ Annotation results in prokka/

cinnetcrash / patomatic Goto Github PK

patomatic's Introduction

PatoMatic

Bioinformatics Workflow for Illumina Data Analysis

Workflow Steps

Dependencies

Setup

Create the Conda Environment

patomatic's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent