Giter VIP home page Giter VIP logo

pplacer's Introduction

pplacer suite of programs

wercker status

pplacer places reads on a phylogenetic tree. guppy (Grand Unified Phylogenetic Placement Yanalyzer) yanalyzes them. rppr is a helpful tool for working with reference packages.

pplacer, guppy, and rppr are free software under the GPL v3.

Related tools

Several other tools have used pplacer as one of their main components. Some of these include:

  • SEPP (Mirarab, Nguyen, Warnow, PSB, 2012) aims to improve the scalability of phylogenetic placement using divide-and-conquer. It uses Ensembles of Hidden Markov Models (implemented by HMMER) to both align sequnces and to find a (small) subtree for placement using pplacer. SEPP can place on the GreenGenes dataset with 200,000 sequences. A standalone version for the GreenGenes reference is available here.

  • paprica (Bowman and Ducklow, PLOS One, 2015) uses Infernal and pplacer to place reads on a reference tree of 16S rRNA genes from all completed genomes in Genbank. The domains Bacteria, Archaea, and Eukarya are all supported. paprica normalizes for 16S rRNA gene copy number and provides and estimate of the enzymes, metabolic contents, and genomic character (e.g. GC content, genome length) of the community. It is available as an Amazon Machine Instance or VirtualBox appliance, however, we recommend that you install the dependencies and run locally. A basic tutorial can be found here.

pplacer's People

Contributors

alienzj avatar cmccoy avatar davidrich27 avatar habnabit avatar matsen avatar metasoarous avatar nhoffman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pplacer's Issues

Merge new-subcommand into dev

There are a couple last things that need to be done before this merge should be done.

  • clusterviz, unifrac, and parts of the old placeviz utility don't have new equivalents, and the former two are still lingering in source control.
  • There should be some tidying-up all around; this has removed a lot of old code, and there's still some more stuff that can be removed. I'll see if I can find a tool that will spot dead code in ocaml.
  • The command group names in guppy.ml should probably be changed or rearranged as they're still just placeholders.
  • There's documentation that needs to be updated, like the readme.
  • Should all references to Mokaphy_{common,base} be refactored/removed?

add a 'changes' file

Currently there's not really any formal way of tracking what changes have been done per release. We should change that!

migration to a real phyloxml implementation

right now,

xphyloxml is what we want to move towards
phyloxml is the old version, which assumes the bark maps have write_xml methods
myxml is my (deprecated) xml writing stuff
[the various bark files will have to be changed as well to make pxtrees]
placeviz uses them!

write output to dir of source file, not cwd

I have a feature request: I'm having a hard time running placeutil from wrapper scripts because of the way that output file paths are generated. Here's an example - let's say I have a placefile topdir/subdir/qalign.place, and I do this:

% cd topdir
% placeutil --distmat subdir/qalign.place

The result is this:

% ls . subdir
.:
qalign.distmat  qalign.place  subdir/

subdir:
qalign.place

A more intuitive (and easier to manage behavior, IMO) would be to write the output to subdir/qalign.distmat (and without making a copy of qalign.place in topdir). One problem is that the calling script has to do extra work to move the output file around if you don't want it in topdir; another problem is the potential for collisions of you have something like

topdir/subdir1/qalign.place  topdir/subdir2/qalign.place

and want to run --distmat on both. I'd propose either making the default behavior as described above or providing a --samedir argument to do the same thing. Perhaps even better would be to have an argument to explicitly name the output file (maybe by allowing --distmat or any other option that results in a single output file to take an argument like --distmat=subdir/whatever.distmat). Possible?

Thanks!
Noah

make a --timing option for pplacer

that reports the total amount of time

  • building the reference tree
  • ranking the edges (totaled across all query seqs)
  • playing ball (ditto)

repeated header line in output of "guppy classify --csv"

eg in classcompare/pipeline/example:

guppy classify --csv -c vaginal_16s.refpkg refs250bp.json
grep name refs250bp_merged.class.csv | head
1:name,desired_rank,rank,tax_id,likelihood,origin
25:name,desired_rank,rank,tax_id,likelihood,origin
68:name,desired_rank,rank,tax_id,likelihood,origin
90:name,desired_rank,rank,tax_id,likelihood,origin
112:name,desired_rank,rank,tax_id,likelihood,origin
134:name,desired_rank,rank,tax_id,likelihood,origin
156:name,desired_rank,rank,tax_id,likelihood,origin
178:name,desired_rank,rank,tax_id,likelihood,origin
200:name,desired_rank,rank,tax_id,likelihood,origin
222:name,desired_rank,rank,tax_id,likelihood,origin

guppy error reading file after round-trip through R JSON parser

I don't know if the R JSON parser generates nonstandard output, but here's the error I see when I read in a json-format file and write it back out using R:

% guppy tog -o foo merged.json
% echo "library(rjson); write(toJSON(fromJSON(file='merged.json')), file=file('merged_r.json','w'))" | R --slave
% guppy tog -o foo merged_r.json
Fatal error: exception Jsontype.Invalid_format("expected array, got string")

Fam_gsl_matvec needs some love.

  • Name sucks
  • Some camelcase and some underscore
  • Needs reorganization
    • Perhaps some submodule action?
  • Use unsafe_get when appropriate

alignment functions need tidying

We have alignments going to lists and to arrays, and no uniform interface.

If we want to layer these list and array functions on top a generator-like interface, that's fine, but all-at-once would be fine for now. Perhaps some mli files once it settles?

new test needed for guppy heat

All that is really needed is to run tests/data/heat/run.sh.

This seems perhaps like a buildbot thing, as it is most easily run directly from the command line.

add a guppy subcommand to filter out placements by classification

Here is the old code

let filter_prefix = match !out_prefix with
    | "" ->
      (Placerun.get_name pr)^"."^(Filename.basename (!tax_exclude_fname))
    | s -> s
    in
    let removes =
      Tax_id.TaxIdSetFuns.of_list
        (List.map Tax_id.of_string
          (File_parsing.string_list_of_file (!tax_exclude_fname)))
    in
    Placerun.multifilter
      [filter_prefix,
        (fun pq ->
          not (Tax_id.TaxIdSet.mem
                (Placement.contain_classif
                  (Pquery.best_place criterion pq))
                removes))]
      pr

but it would be nice to have inclusion and an exclusion filters.

rjson wants newline at end of pplacer json format

I'm getting a warning message from R's JSON parser when I read pplacer-generated json files:

% pwd
/home/nhoffman/working/microbiome-demo
% echo "library(rjson); jout <- fromJSON(file='p4z1r36.json')" | R --slave
Warning message:
In readLines(file) : incomplete final line found on 'p4z1r36.json'

It seems to be made happy by adding a newline to the end of the file:

% echo "" >> p4z1r36.json

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.