Giter VIP home page Giter VIP logo

svviz2's Introduction

svviz2

Build Status

This is a near complete rewrite of svviz1. New features:

  • uses bwa mem under the hood for realignments
    • substantial improvements in reliability and speed when realigning long reads
    • enables realignment against entire genome, identifying potential second-best hits
    • calculates a quantitative mapping quality score taking account of ref and alt hits genome-wide
    • uses weighted mapq scores to calculate evidence for ref and alt alleles, including genotype likelihoods
  • substantially improved visualizations
    • "quick consensus" reduces background error rate in pacbio/nanopore and other long-read technologies
    • optionally uses tandem repeat finder (trf) to identify tandem repeats near candidate SV
    • visualization engine has been refactored into a separate genomeview module, facilitating future improvements
  • integrated dotplots
    • visualizes ref vs alt, allowing for visual identification of tandem repeats and other complex sequence
    • if bwa is being used for realignment, visualizes any second-best hit regions against candidate SV locus
    • if long-reads are provided as input, picks several long reads to plot as dotplots against ref and alt

Installation

svviz2 requires python 3.3 or greater. To perform tandem repeat detection, download tandem repeats finder, rename the binary to "trf" and move it into your PATH. To visualize the dotplots, the rpy2 package must be installed.ย 

To install, run the following command, Ideally from within a virtualenv:

pip install -U git+git://github.com/nspies/svviz2.git

A few more notable changes with respect to version 1.x

  • variants are input in VCF format; please create an issue if you find a well-defined variant that is not supported by the current version of svviz2
  • VCF files must more or less conform to the spec -- svviz2 uses pysam which uses htslib to load VCF files

Note that svviz2 does not natively support parallelization. You are probably best off parallelizing over variants (or samples). If it appears that svviz2 is using more than 1 core during realignment, it may be because numpy can in some circumstances use multiple threads (see here to deactivate this behavior).

Usage

usage: svviz2 [options] [demo] --ref REF --variants VARIANTS BAM [BAM2 ...]

svviz2 version 2.0a2

optional arguments:
  -h, --help            show this help message and exit

Required arguments:
  bam                   sorted, indexed bam file containing reads of interest to plot; can be specified 
                        multiple times to load multiple samples
  --ref REF, -r REF     reference fasta file (a .faidx index file will be created if it doesn't exist so you 
                        need write permissions for this directory)
  --variants VARIANTS, -V VARIANTS
                        the variants to analyze, in vcf or bcf format (vcf files may be compressed with gzip)

Optional arguments:
  --outdir OUTDIR, -o OUTDIR
                        output directory for visualizations, summaries, etc (default: current working 
                        directory)
  --format FORMAT       format for output visualizations; must be one of pdf, png or svg (default: pdf,or svg 
                        if no suitable converter is found)
  --savereads           output the read realignments against the appropriate alt or ref allele (default: false)
  --min-mapq MIN_MAPQ   only reads with mapq>=MIN_MAPQ will be analyzed; when analyzing paired-end data,
                        at least one read end must be near the breakpoints with this mapq (default:0)
  --align-distance ALIGN_DISTANCE
                        sequence upstream and downstream of breakpoints to include when performing re-alignment
                        (default: infer from data)
  --batch-size BATCH_SIZE
                        Number of reads to analyze at once; larger batch-size values may run more quickly
                        but will require more memory (default=10000)
  --downsample DOWNSAMPLE
                        Ensure the total number of reads per event per sample does not exceed this number by 
                        downsampling (default: infinity)
  --aligner ALIGNER     The aligner to use for realigning reads; either ssw (smith-waterman) or
                        bwa (default=bwa)
  --only-realign-locally
                        Only when using bwa as the aligner backend, when this option is enabled,
                        reads will only be aligned locally around the breakpoints and not also against
                        the full reference genome (default: False)
  --fast                More aggressively skip reads that are unlikely to overlap
                        the breakpoints (default: false)
  --first-variant FIRST_VARIANT
                        Skip all variants before this variant; counting starts with first variantin input VCF 
                        as 0 (default: 0)
  --last-variant LAST_VARIANT
                        Skip all variants after this variant; counting starts with first variantin input VCF 
                        as 0 (default: end of vcf)
  --render-only
  --dotplots-only
  --report-only
  --only-plot-context ONLY_PLOT_CONTEXT
                        Only show this many nucleotides before the first breakpoint, and the last breakpoint
                        in each region (default: show as much context as needed to show all reads fully)
  --also-plot-context ALSO_PLOT_CONTEXT
                        Generates two plots per event, one using the default settings, and one generated
                        by zooming in on the breakpoints as per the --only-plot-context option

svviz2's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.