Giter VIP home page Giter VIP logo

metacoder's People

Contributors

ctb avatar ethanbass avatar grabear avatar grabearummc avatar grunwald avatar jotech avatar zachary-foster avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metacoder's Issues

Make `plot_taxonomy` more modular

Currently plot_taxonomy is over 500 lines long. This should not be. A couple things that could be done:

  • Make all text grobs using the same function from a common data frame. Only the x,y,size, color, etc need to be defined in the function. Do the conversion of coordinates in this function as well.
  • Put all that mess associated with the legend in its own function.
  • Make some of the longer internally defined functions independent external functions.
    • get_sub_graphs
    • get_sub_layouts
    • get_search_space
    • find_overlap
    • infer_size_range
    • select_labels

pariwise differential display

Make some sort of graphic for comparing more than two treatments. Perhaps a pairwise set of graphs, each highlighting differences.

Make `taxonomic_sample` function

Make taxonomic_sample function that abstracts the functionality used in get_taxon_sample so it can be used to recursivly subsample any set of observations as long as functions can be defined to get subtaxa ids and sample ids gor given taxa.

taxonomic_sample would need the following additional options:

  • get_subtaxa : The function used to get the subtaxa of a given taxon.
  • get_observations : The function used to get observations (e.g. sequence indexes) for a given taxon

The return type could either be a vector of observation indexes or concatenated get_observations output.

get_taxon_sample would then be rewritten as a special case of taxonomic_sample

`ncbi_taxon_sample`: handle unranked taxa better

Some taxa do not have a taxonomic level assigned. This makes level-based actions like count filtering not applicable. Currently, high level taxa without defined levels are not subject to count filtering, sometimes making them inflated. I need a way to handle this more intellegently. Perhaps they can adopt the rank of sister taxa if they exist. Or, there could be a limit as to how many consecutive levels of unidentified ranks will be explored before giving up. Setting this level to 0, would provide a way to ignore taxa without assigned levels. Not perfect solutions, but they could help...

plot_taxonomy: make margin take into account labels / standardize stat suffixes

Currently, labels are plotted on a [0,1] coordinate space whereas the other elements are plotted in the space returned by igraph layout functions. Therefore, there is no easy way right now to estimate the size of labels in igraph space, so labels do not effect the plotting window/margins. This means that if the margin is set to 0, labels can be cut off.

To fix this, I need to estimate the size of labels in igraph space. While doing this, I should also standardize the meaning of the statistic suffixes:

  • _u : what the user supplied
  • _t : transformed user input
  • _g: value in igraph space
  • _p: value in terms of proportion of graph dimension. This has to be used for plotting text.

Fix plotting anomalies

For some reason, the text in plots is no longer positioned correctly. Also, the default size range of elements is off.

Allow install without emboss

Currently, building the vignettes requires primersearch, from the emboss tool kit to be installed. This should not be.

`plot_taxonomy`: Make text scale with viewport size

Source

This idea and the majority of the descriptive text is from the following blog post:

http://ryouready.wordpress.com/2012/08/01/creating-a-text-grob-that-automatically-adjusts-to-viewport-size/

The method

First, create a new grob class called resizingTextGrob that is supposed to resize automatically

library(grid)
library(scales)

resizingTextGrob <- function(...)
{
  grob(tg=textGrob(...), cl='resizingTextGrob')
}

The drawDetails method is called automatically when drawing a grob using grob.draw.

drawDetails.resizingTextGrob <- function(x, recording=TRUE)
{
  grid.draw(x$tg)
}

The preDrawDetails method is automatically called before any drawing occures

preDrawDetails.resizingTextGrob <- function(x)
{
  h <- convertHeight(unit(1, 'snpc'), 'mm', valueOnly=TRUE)
  fs <- rescale(h, to=c(18, 7), from=c(120, 20))
  pushViewport(viewport(gp = gpar(fontsize = fs)))
}

Clean up after the drawing the created viewport is popped:

postDrawDetails.resizingTextGrob <- function(x)
  popViewport()

Test it out

g <- resizingTextGrob(label='test 1')
grid.draw(g)
grid.text('test 2', y=.4)

Integration with ggplot2

library(ggplot2)
x = data.frame(x = 1:10, y = 1:10)
ggplot(data = x, aes(x = x, y = y)) + geom_point() + annotation_custom(g)
ggplot(data = x, aes(x = x, y = y)) + geom_point() + annotation_custom(g)

New function: extract_taxonomy

A new function to extract the taxonomy information from sequence headers could be very useful. I imagine this would be an upper level function, potentially built upon more specific functions. The basic idea is that the user would supply a vector of sequence headers and identify the locations of bits of information that could be used to derive a taxonomy.

For example, say a particular database had a sequence header of the type:

>name_of_sequence-1234-description
where 1234 was the genbank id. The function would be called something like

x <- c(">name_of_sequence-1234-description", ...)
extract_taxonomy(x, "^>.+-%genid%-.+$")

Where %genid% would be a function specific pattern indicating the identity of relevant information. The function would then extract the genbank id using a modified version of the supplied regex, look up the taxonomy information and parse it into a standardized format.

I imagine the output would consist of two parts: a vector of numerical taxon ids, named by their common names, and something equivalent to an adjacency list or adjacency matrix or taxon lineage list that allows for a given taxon id to defined in the context of the entire taxonomy. The taxon id could be official (e.g. genbank taxon uids) or arbitrary if the sequence header information had a taxonomy lineage:

>name_of_sequence:kingdom-phylum-class-order-family-genus-species:description

`primersearch`: document and standardize output

There is some ambiguity about whether the primer sequences are included in the amplicon and "length" fields.
This should be specified.
Perhaps and option should be added to choose if primer sequences are taken into account.

new function: `clean_taxonomy`

I often need to clean up the names of taxons, such as:

Inocybaceae_sp (not informative for species, should be NULL)
Rhodotorula_cycloclastica (should be "cycloclastica")
Caloplaca_sp_RVM_2012
Russula_cf_brevipes_RK8
Russula_brevipes_var_acrior
Russula_aff_brevipes_r_04085
"Crenarchaeota"

`plot_taxonomy`: add option to use default text grobs

The automatically scaling text grobs are ideal for most publication applications, but they are apparently computing intensive (for some reason I have not investigated yet) to implement when there are hundreds or more. It would be good to have an option to use standard text grobs when lots of text will need to be displayed.

Request: add examples in man pages

Documentation in vignettes is a great thing, but having examples in the man pages is extremely helpful especially when the user wants to quickly see how a certain function is used.

extract_taxonomy: mixed id types in classifications

Add ability to output a mixture of verified unique taxon ids and arbitrary ids when looking up classification names. I imagine this could be useful when some taxon names cannot be found in a database.

  1. assign arbitrary ids to each taxon in a classification
  2. look up taxon ids from names
  3. replace arbitrary ids with returned ids
  4. add a column to the taxa output indicating the type of id: ("verified", "arbitrary", or "unknown")

Suggestion: make internal functions truely internal

When I look at the index for metacoder, there are a lot of functions that don't seem to be immediately of use to the user, such as resizingTextGrob(). Additionally, I notice that there are unexported functions documented such as verify_color_range().

These create a lot of clutter in the index for metacoder and might confuse users. I have a couple of suggestions to de-clutter these.

Exported internal functions

Don't export them. You can change/add/remove internal functions to your heart's desire, but an exported function requires a version update and has the potential to break a user's workflow.

Internal documentation

Documenting internal functions is a fantastic idea, but it should not be displayed in the index since that becomes the table of contents for the user manual. Instead, I recommend to add @keywords internal to your roxygen directives for the unexported functions. This will still create the documentation for those who need it, but will hide it from those who don't.

Complete workflow vignette

There should be a workflow vignette that provides a few example workflows.

Something like:

  1. Parse example FASTA file with extract_taxonomy
  2. Plot classifications
  3. Subsample with taxonomic_sample
  4. Plot subsampled classifications
  5. In silico PCR
  6. Plot results of insilico PCR
  7. Barcode gap analysis
  8. Plot results of barcode gap analysis

Complete mitochondrial sequences and COX1 as the barcode would be a clean example.
Maybe we are trying to evalulate a barcode for a group of insects...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.