Giter VIP home page Giter VIP logo

pangraph's Introduction

pangraph

Documentation Docker Image Version (latest semver) Docker Pulls

a bioinformatic toolkit to align large sets of closely related genomes into a graph data structure

Overview

pangraph provides both a command line interface, as well as a Julia library, to find homology amongst large collections of closely related genomes. The core of the algorithm partitions each genome into pancontigs that represent a sequence interval related by vertical descent. Each genome is then an ordered walk along pancontigs; the collection of all genomes form a graph that captures all observed structural diversity. pangraph is a standalone tool useful to parsimoniously infer horizontal gene transfer events within a community; perform comparative studies of genome gain, loss, and rearrangement dynamics; or simply to compress many related genomes.

Installation

The core algorithm and command line tools are self-contained and require no additional dependencies. The library is written in and thus requires Julia to be installed on your machine.

pangraph is available:

  • as a julia library
  • as a Docker container
  • it can be compiled into a relocatable binary

For more extended instructions on installation please refer to the documentation.

Julia Library

To install pangraph as a julia library in a local environment:

    # clone the repository
    git clone https://github.com/neherlab/pangraph.git && cd pangraph
    # build the package
    julia --project=. -e 'using Pkg; Pkg.build()'

The library can be accessed directly by entering the REPL:

    julia --project=.

Alternatively, command-line functionalities can be accessed by running the main src/PanGraph.jl script:

    # example: build a graph from E.coli genomes
    julia --project=. src/PanGraph.jl build -c example_datasets/ecoli.fa.gz > graph.json

Note that to access the complete set of functionalities, the optional dependencies must be installed and available in your $PATH.

Docker container

PanGraph is available as a Docker container:

    docker pull neherlab/pangraph:latest

See the documentation for extended instuctions on its usage.

Relocatable binary

pangraph can be built locally on your machine by running (inside the cloned repo)

    export jc="path/to/julia/executable" make pangraph && make install

This will build the executable and place a symlink into bin/. Importantly, if jc is not explicitly set, it will default to vendor/julia-$VERSION/bin/julia. If this file does not exist, we will download automatically for the user, provided the host system is Linux or MacOSX. Moreover, for the compilation to work, it is necessary to have MAFFT and mmseqs2 available in your $PATH, see optional dependencies.

Note, it is recommended by the PackageCompiler.jl documentation to utilize the officially distributed binaries for Julia, not those distributed by your Linux distribution. As such, compilation may not work if you attempt to do so.

Optional dependencies

pangraph can optionally use mash, MAFFT, mmseqs2 or fasttree for some optional functionalities, as explained in the documentation. For use of these functionalities, it is recommended to install these tools and have them available on $PATH.

Alternatively, a script bin/setup-pangraph is provided to install both tools into bin/ for Linux-based operating systems.

Examples

Please refer to the tutorials within the documentation for an in-depth usage guide. For a quick reference, see below.

Align a multi-fasta sequence.fa and realign each pancontig with MAFFT

	pangraph build sequence.fa | pangraph polish > graph.json

Export a graph graph.json into export/pangraph.gfa as GFA for visualization

	pangraph export graph.json

Compute all pairwise graphs and estimate parsimonious number of events between strains. Output all computed data to directory pairs

	pangraph marginalize -d pairs graph.json

Citing

PanGraph: scalable bacterial pan-genome graph construction Nicholas Noll, Marco Molari, Richard Neher bioRxiv 2022.02.24.481757; doi: https://doi.org/10.1101/2022.02.24.481757

License

MIT License

pangraph's People

Contributors

nnoll avatar mmolari avatar ivan-aksamentov avatar rneher avatar liampshaw avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.