Crazy idea: when reading into a phylo, should we create a new RNeXML environment, stor

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

(Um, note that plot(tree) is the ape method <code cla

Attaching nexml metadata to phylo objects about rnexml HOT 8 CLOSED

ropensci commented on June 24, 2024

Attaching nexml metadata to phylo objects

from rnexml.

Comments (8)

sckott commented on June 24, 2024

@cboettig That seems reasonable at small scale, but with huge xml files then we would be putting a lot of data into the users workspace without them realizing it.

from rnexml.

cboettig commented on June 24, 2024

In general it would be useful to read in a nexml object that could be passed directly to functions based on ape trees without requiring coercion and dropping of metadata.

I never understood why phylobase didn't do this -- but it appears that phylo4 objects do not inherit the phylo S3 class and cannot be passed to phylo functions without explicit coercion:

library(phylobase)
library(ape)
data(bird.orders)
bird.orders4 <- as(bird.orders, "phylo4") # make ape::phylo tree into phylobase::phylo4 S4 class
plot.phylo(bird.orders4) # attempting to use the S4 fails

Of course a plot function is defined for phylo4, but more interesting functions are not written for phylo4, so this is a huge handicap: consider:

 S <- c(10, 47, 69, 214, 161, 17, 355, 51, 56, 10, 39, 152,
             6, 143, 358, 103, 319, 23, 291, 313, 196, 1027, 5712)
bd.ext(bird.orders4, S)   # Fails again. Works with the S3 type

Anyway, it appears this problem can be solved using setOldClass. I've defined an the class phyloS4 which inherits all methods for the S3 phylo class without having to explicitly declare those methods. In this way, we have the benefits of an S4 class while maintaining compatibility with all developers who only write functions based on the S3 class. (as long as functions don't stupidly check the string identity class(obj) == "phylo", instead of using the proper class check is(obj, "phylo")....)

I can then build a new class, nexmlTree by extending this class. Again my new class acts like an S3 phylo in any such functions, but adds a representation containing all the nexml data. This approach doesn't minimize memory footprint, but usually that is not a concern for R users (otherwise coercion is always an option). It does satisfy the need for an object that works with all existing functions while also containing any and all metadata we can express in nexml.

See R/extend_phylo.R for the defitition.

from rnexml.

cboettig commented on June 24, 2024

Looking for feedback on this approach.

It appears that phylobase didn't choose to extend the phylo class in a way that phylo4 objects could be simply passed to existing functions designed for the S3 phylo objects. This is possible, as I have now implemented with the tentatively named nexmlTree class, and describe here: http://carlboettiger.info/2013/10/07/nexml-phylo-class-extension.html

On one hand, it seems to make sense that we want an object that both has the metadata attached to it, with methods that can operate to extract, display, and potentially compute on that metadata, but still works as a tree object in all existing functions.

On the other hand, this makes a larger object, since it has all this metadata attached (possibly not a problem?). It can also introduce more potential trouble to have users using this object directly in their workflow, instead of converting to a vanilla phylo object and using that (for instance, as I describe in my linked notes, methods that check class with string matching instead of the built-in method will throw an error).

Seems it is an important design choice whether we build methods around the extended class or have separate methods for working on RNeXML S4 object metadata and just convert that to an ape::phylo for tree methods? @schamberlain @hlapp @rvosa thoughts?

from rnexml.

sckott commented on June 24, 2024

whether we build methods around the extended class or have separate methods for working on RNeXML S4 object metadata and just convert that to an ape::phylo for tree methods

Do you have a feeling for which is better?

from rnexml.

hlapp commented on June 24, 2024

Not clear to me what the concrete consequences for users would be. Can you explicate?

from rnexml.

cboettig commented on June 24, 2024

With separate objects, users would have to decide to read in a NeXML file as nexml (and later convert it), or read it in directly as "phylo" and later read it in again to do anything with the metadata. e.g.:

tree <- nexml_read("file.xml", type="phylo") # object of class "phylo"
plot(tree)

nexml_tree <- nexml_read("file.xml", type="nexml") # object of class "nexml"
tree <- as(nexml_tree, "phylo")
plot(tree)

while to perform metadata functions they have to operate on the nexml object instead:

summary(nexml_tree) 
citation(nexml_tree)
license(nexml_tree)

(those methods not yet written btw).

In Option 2, with a combined interface, the user would use the same object for all purposes:

tree <- nexml_read("file.xml")  # object of class "nexmlTree"
plot(tree)
metadata(tree)
summary(tree)
license(tree)

etc. Clearly the interface is cleaner in the later context. The cost is larger object memory size and a chance that poorly written phylogenetics functions (at least ones that check class using strings) fail.

from rnexml.

cboettig commented on June 24, 2024

(Um, note that plot(tree) is the ape method plot.phylo, I'm just using it to illustrate any existing method. Could be a richer function like bd.ext, any function from gieger, OUwie, phytools etc. Meanwhile the other 'metadata' functions would be the unique functions provided in RNeXML to handle the metadata. I'm not sure quite what or how many such functions we'll have, but see ideas in #20)

from rnexml.

cboettig commented on June 24, 2024

Okay, I think we can just support both and let the user decide. The metadata methods (now implemented, see #20 (comment) and commit 94996e6 ) are written for the "nexml" class and inherited by the "nexmlTree" class. By default, I support the second method; e.g. tree <- nexml_read("file.xml") will read in an object of class "nexmlTree" that acts like a phylo object has all the metadata attached, with associated methods. Users who would prefer a pure phylo object can coerce this or read it in as such, as shown above.

Not sure if users will have any use for the raw nexml class, since the nexmlTree class has the added benefit of working in phylo methods. Still, it is available as an object for any user or developer just needing an R S4 representation of a nexml document.

I think this resolves this question. Re-open with outstanding issues, or feel free to add further questions or comments.

from rnexml.

Attaching nexml metadata to phylo objects about rnexml HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent