Giter VIP home page Giter VIP logo

snakemake-illumina-fastqc's Introduction

Snakemake workflow: FastQC and MultiQC of Next-generation Sequencing Data

Motivation

This is a Snakemake pipeline for quality control of Illumina next-generation sequencing data. It performs quality check using FastQC on raw fastq-files and merges fastqc reports using MultiQC.

Prerequisites

  • Conda is a package, dependency and environment management system that is used to install software packages and manage their dependencies. It runs on Linux, OS X and Windows, and was created for Python programs but it can package and distribute software for any language. install conda for your operating system: Linux, MacOS.

  • Snakemake is a workflow management system that allows to create reproducible and scalable data analyses.

  • FastQC is a quality control tool for high throughput sequence data.

  • MultiQC is a tool to aggregate bioinformatics results across many samples into a single report. Configuration The configuration file is located in config/config.yaml. This file contains paths to input files and directories, output directories, and other settings.

Usage

  1. Clone the repository:
git clone https://github.com/kevin-wamae/fastqc-multiqc-pipeline.git
  1. Navigate into the cloned directory using the following command:
cd fastqc-multiqc-pipeline
  1. Create a conda environment (named fastqc-multiqc-pipeline) for the pipeline:
conda env create --file workflow/envs/environment.yaml
  1. Activate the conda environment. This needs to be done every time you exit and restart your terminal and want re-run this pipeline:
conda activate fastqc-multiqc-pipeline
  1. Run the pipeline with Snakemake:
snakemake --cores <number_of_cores> --use-conda
  • The --cores option specifies the number of cores to use, and the --use-conda option tells Snakemake to use the specified conda environments.
  1. Finally, the config file is located in config/config.yaml.
    • This file contains paths to input files and directories, output directories, and other settings such as the number of cores to use.
    • You can edit this file to suit your needs.
    • For example, you can change the number of cores to use by editing the extra:threads: parameter

Output

  • The pipeline generates HTML reports for each sample in the fastqc directory and a merged HTML report in the multiqc directory.

Dependencies

  • This pipeline uses conda environments to manage dependencies for each rule. The environments are defined in envs/fastqc.yaml and envs/multiqc.yaml.

Contact

  • Report any issues or bugs by openning an issue here or contact me via email (wamaekevin[at]gmail.com)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.