Giter VIP home page Giter VIP logo

deepvariant-glnexus-wdl's Introduction

DeepVariant+GLnexus workflows

These portable WDL workflows use DeepVariant to call variants from WGS read alignments, followed by GLnexus to merge the resulting Genome VCF (gVCF) files for several samples into a Project VCF (pVCF). The wdl/ directory has three nested workflows:

DeepVariant.wdl

Based on the DeepVariant docs, the sequential workflow to generate gVCF from a given BAM file and genomic range.

             +----------------------------------------------------------------------------+
             |                                                                            |
             |  DeepVariant.wdl                                                           |
             |                                                                            |
             |  +-----------------+    +-----------------+    +------------------------+  |
sample.bam   |  |                 |    |                 |    |                        |  |
 genome.fa ----->  make_examples  |---->  call_variants  |---->  postprocess_variants  |-----> gVCF
     range   |  |                 |    |                 |    |                        |  |
             |  +-----------------+    +--------^--------+    +------------------------+  |
             |                                  |                                         |
             |                                  |                                         |
             +----------------------------------|-----------------------------------------+
                                                |
                                       DeepVariant Model

make_examples and call_variants internally parallelize across CPUs on the machine they run on. The tasks use the docker image published by the DeepVariant team.

htsget_DeepVariant.wdl

To further parallelize WGS calling accross several machines, scatters DeepVariant.wdl across several genomic ranges (typically full-length chromosomes). For each range, fetches a BAM slice using the GA4GH htsget client in samtools 1.7+, given an htsget server endpoint and sample ID. Finally, concatenates the per-range gVCFs to the complete product.

             +--------------------------------------------------------------------------------+
             |                                                                                |
             |  htsget_DeepVariant.wdl                                                        |
             |                                                                                |
             |       +-----------------+    +-------------------+                             |
             |       |                 |    |                   |  range gVCF                 |
             |   +--->  htsget client  |---->  DeepVariant.wdl  |---+                         |
             |   |   |  (samtools)     |    |                   |   |                         |
             |   |   |                 |    +-------------------+   |                         |
sample ID    |   |   +-----------------+                            |  +-------------------+  |
             |   |                                                  +-->                   |  |
   ranges -------+---> ...                  ...                 ... --->  bcftools concat  +-----> sample gVCF
    (e.g.    |   |                                                  +-->                   |  |
     chr1    |   |   +-----------------+                            |  +-------------------+  |
     chr2    |   |   |                 |    +-------------------+   |                         |
     ...)    |   +--->  htsget client  |    |                   |   |                         |
             |       |  (samtools)     |---->  DeepVariant.wdl  |---+                         |
             |       |                 |    |                   |  range gVCF                 |
             |       +------------^----+    +-------------------+                             |
             |            |       |                                                           |
             |            |       |                                                           |
             +------------|-------|-----------------------------------------------------------+
                          |       |
               sample ID  |       |
                   range  |       |  range BAM
                          |       |
                     +----v------------+
                     |                 |
                     |  htsget server  |
                     |                 |
                     +-----------------+

By using htsget, the workflow scatters across the ranges without first having to download and slice up a monolithic BAM file.

htsget_DeepVariant_GLnexus.wdl

Scatters htsget_DeepVariant.wdl across several samples to generate an array of gVCF files, then feeds these to GLnexus to merge them into a pVCF.

              +-----------------------------------------------------------+
              |                                                           |
              |  htsget_DeepVariant_GLnexus.wdl                           |
              |                                                           |
              |       +--------------------------+                        |
              |       |                          |   sample gVCF          |
              |   +--->  htsget_DeepVariant.wdl  |----+                   |
              |   |   |                          |    |                   |
              |   |   +--------------------------+    |    +-----------+  |
              |   |                                   +---->           |  |
sample IDs -------+---> ...                      ...  ----->  GLnexus  +----> project VCF
              |   |                                   +---->           |  |
              |   |   +--------------------------+    |    +-----------+  |
              |   |   |                          |    |                   |
              |   +--->  htsget_DeepVariant.wdl  |----+                   |
              |       |                          |   sample gVCF          |
              |       +--------------------------+                        |
              |                                                           |
              +-----------------------------------------------------------+

Here's an example inputs JSON providing everything required to launch this top-level workflow with dxWDL or Cromwell:

{
    "htsget_DeepVariant_GLnexus.accessions": ["NA12878","NA12891","NA12892"],
    "htsget_DeepVariant_GLnexus.htsget_endpoint": "https://htsnexus.rnd.dnanex.us/v1/reads/BroadHiSeqX_b37",
    "htsget_DeepVariant_GLnexus.ranges": ["12:112204691-112247789","17:41196312-41277500"],
    "htsget_DeepVariant_GLnexus.ref_fasta_gz": (REFERENCE GENOME FILE),
    "htsget_DeepVariant_GLnexus.model_tar": (DEEPVARIANT MODEL FILES),
    "htsget_DeepVariant_GLnexus.output_name": "b37_CEUtrio_ALDH2_BRCA1",
}

deepvariant-glnexus-wdl's People

Contributors

mlin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.