Giter VIP home page Giter VIP logo

bugseq-pipeline's Introduction

bugseq-pipeline

BugSeq automatically analyzes clinical microbiology nanopore sequencing data from start to finish. This includes taxonomic classification of reads, antimicrobial resistance prediction and detailed subtyping for public health purposes. It was created during hackseq 2019.

Rationale

Modern clinical microbiology techniques take a day to grow and identify an organism, and another day to determine antimicrobial susceptibilities. Yet, clinical trials show that patients with septic shock have a ~7% increase in mortality for every hour delay in appropriate antimicrobial therapy. Furthermore, patients with rare or novel infections may never have the etiology of their illness diagnosed as traditional techniques can not pick up their identify their infection. Metagenomic nanopore sequencing has the potential to drastically speed up the diagnosis and characterization of infections, potentially including novel pathogens, enabling better patient outcomes. Recovering pathogen genomes provides a vast amount of clinically useful information, such as whether a patient's E. coli is susceptible to ceftriaxone, whether a patient's V. cholerae is toxigenic, or whether the M. tuberculosis between two patients are likely to be epidemiologically linked.

Quick start

git clone https://github.com/schorlton/bugseq-pipeline.git
cd bugseq-pipeline
nextflow main.nf --fastq in.fq --outdir output_dir

Outline

Requirements

Input

A nanopore basecalled fastq. Can be any library version (R7-10). Can be barcoded or not. Can be amplicon data (eg. 16S/ITS), isolate data (eg. a colony of Staphylococcus aureus), or clinical metagenomic data (eg. the sputum of a patient).

Output

An interactive html file and a static pdf summary file. These files will show a taxonomic classification of the percentage makeup of organisms in each patient sample. This could include viral, bacterial, fungal and protozoal organisms. For each of these organisms, the presence of antimicrobial resistance genes will be visualized, along with the predicted phenotype for the antimicrobial drug associated with these genes.

Example here

Options

usage: nextflow main.nf --fastq 'in.fq' --outdir 'out_dir' [options...]
options:
  # Input options
  --fastq
  --control_fastq            Control sample fastqs for complex subtraction from cases. Can use regex patterns or specify multiple file separated by commas.
  
  # Output options
  --outdir
  
  # Pipeline options
  --isolate                 Input file(s) are from isolate sequencing [default: automatic detection]
  --metagenome              Input file(s) are metagenomic samples [default: automatic detection]
  --16S                     Input file(s) are 16S amplicon sequencing data [default: automatic detection]
  --ITS                     Input file(s) are ITS amplicon sequencing data [default: automatic detection]
  --skipQC                  Skip quality assessment and read trimming
  --skipTyping              Skip public health typing analysis
  --skipAMR                 Skip AMR prediction step
  --meanQ [7]               Reads with mean quality below this value will be filtered from analysis
  --minLength [250]         Reads with length below this threshold will be filtered from analysis

Pipeline overview

  1. User inputs basecalled nanopore fastq reads
  2. BugSeq validates the fastq file (fqtools) and determines if it's truly nanopore data
  3. Next, the fastq undergoes quality assessment with FastQC and results combined with multiqc
  4. Reads are adapter trimmed and demultiplexed (qcat)
  5. Reads are quality and length filtered
  6. Trimmed and demultiplexed reads undergo experiment type detection to determine if this is amplicon data (eg. 16S/ITS), cultured isolate data or metagenomic data (magic..., including sourmash)

Isolate data

  1. Genome assembly (Flye)
  2. Taxonomic classification of assembly (minimap2 + Pathoscope ID)

Metagenomic data

  1. Read-level taxonomic classification to species level
  2. Correction for control samples to identify significant pathogens in the cases only
  3. Metagenome assembly (metaFlye)
  4. Taxonomic binning of species within metagenome

Pathogen specific analyses

  • Phenotypic antimicrobial resistance prediction
  • MLST (when public scheme avaiable)
  • cgMLST/wgMLST (when public scheme available)
  • Serotyping (when applicable)
  • Other old-school typing (eg. Spoligotyping for M. tuberculosis)
  • Toxin detection (when clinically relevant)
  • Phylogenetic tree building (when multiple isolates inputted)

Changelog

0.0.1

bugseq-pipeline's People

Contributors

schorlton avatar heathervant avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.