Comments (8)
@cboettig That seems reasonable at small scale, but with huge xml files then we would be putting a lot of data into the users workspace without them realizing it.
from rnexml.
In general it would be useful to read in a nexml object that could be passed directly to functions based on ape trees without requiring coercion and dropping of metadata.
I never understood why phylobase didn't do this -- but it appears that phylo4 objects do not inherit the phylo S3 class and cannot be passed to phylo functions without explicit coercion:
library(phylobase)
library(ape)
data(bird.orders)
bird.orders4 <- as(bird.orders, "phylo4") # make ape::phylo tree into phylobase::phylo4 S4 class
plot.phylo(bird.orders4) # attempting to use the S4 fails
Of course a plot
function is defined for phylo4, but more interesting functions are not written for phylo4, so this is a huge handicap: consider:
S <- c(10, 47, 69, 214, 161, 17, 355, 51, 56, 10, 39, 152,
6, 143, 358, 103, 319, 23, 291, 313, 196, 1027, 5712)
bd.ext(bird.orders4, S) # Fails again. Works with the S3 type
Anyway, it appears this problem can be solved using setOldClass
. I've defined an the class phyloS4
which inherits all methods for the S3 phylo
class without having to explicitly declare those methods. In this way, we have the benefits of an S4 class while maintaining compatibility with all developers who only write functions based on the S3 class. (as long as functions don't stupidly check the string identity class(obj) == "phylo"
, instead of using the proper class check is(obj, "phylo")
....)
I can then build a new class, nexmlTree
by extending this class. Again my new class acts like an S3 phylo
in any such functions, but adds a representation containing all the nexml data. This approach doesn't minimize memory footprint, but usually that is not a concern for R users (otherwise coercion is always an option). It does satisfy the need for an object that works with all existing functions while also containing any and all metadata we can express in nexml.
See R/extend_phylo.R for the defitition.
from rnexml.
Looking for feedback on this approach.
It appears that phylobase didn't choose to extend the phylo
class in a way that phylo4
objects could be simply passed to existing functions designed for the S3 phylo
objects. This is possible, as I have now implemented with the tentatively named nexmlTree
class, and describe here: http://carlboettiger.info/2013/10/07/nexml-phylo-class-extension.html
On one hand, it seems to make sense that we want an object that both has the metadata attached to it, with methods that can operate to extract, display, and potentially compute on that metadata, but still works as a tree object in all existing functions.
On the other hand, this makes a larger object, since it has all this metadata attached (possibly not a problem?). It can also introduce more potential trouble to have users using this object directly in their workflow, instead of converting to a vanilla phylo
object and using that (for instance, as I describe in my linked notes, methods that check class with string matching instead of the built-in method will throw an error).
Seems it is an important design choice whether we build methods around the extended class or have separate methods for working on RNeXML S4 object metadata and just convert that to an ape::phylo
for tree methods? @schamberlain @hlapp @rvosa thoughts?
from rnexml.
whether we build methods around the extended class or have separate methods for working on RNeXML S4 object metadata and just convert that to an ape::phylo for tree methods
Do you have a feeling for which is better?
from rnexml.
Not clear to me what the concrete consequences for users would be. Can you explicate?
from rnexml.
With separate objects, users would have to decide to read in a NeXML file as nexml (and later convert it), or read it in directly as "phylo" and later read it in again to do anything with the metadata. e.g.:
tree <- nexml_read("file.xml", type="phylo") # object of class "phylo"
plot(tree)
or
nexml_tree <- nexml_read("file.xml", type="nexml") # object of class "nexml"
tree <- as(nexml_tree, "phylo")
plot(tree)
while to perform metadata functions they have to operate on the nexml
object instead:
summary(nexml_tree)
citation(nexml_tree)
license(nexml_tree)
(those methods not yet written btw).
In Option 2, with a combined interface, the user would use the same object for all purposes:
tree <- nexml_read("file.xml") # object of class "nexmlTree"
plot(tree)
metadata(tree)
summary(tree)
license(tree)
etc. Clearly the interface is cleaner in the later context. The cost is larger object memory size and a chance that poorly written phylogenetics functions (at least ones that check class using strings) fail.
from rnexml.
(Um, note that plot(tree)
is the ape method plot.phylo
, I'm just using it to illustrate any existing method. Could be a richer function like bd.ext
, any function from gieger
, OUwie
, phytools
etc. Meanwhile the other 'metadata' functions would be the unique functions provided in RNeXML to handle the metadata. I'm not sure quite what or how many such functions we'll have, but see ideas in #20)
from rnexml.
Okay, I think we can just support both and let the user decide. The metadata methods (now implemented, see #20 (comment) and commit 94996e6 ) are written for the "nexml" class and inherited by the "nexmlTree" class. By default, I support the second method; e.g. tree <- nexml_read("file.xml")
will read in an object of class "nexmlTree" that acts like a phylo object has all the metadata attached, with associated methods. Users who would prefer a pure phylo
object can coerce this or read it in as such, as shown above.
Not sure if users will have any use for the raw nexml
class, since the nexmlTree
class has the added benefit of working in phylo
methods. Still, it is available as an object for any user or developer just needing an R S4 representation of a nexml document.
I think this resolves this question. Re-open with outstanding issues, or feel free to add further questions or comments.
from rnexml.
Related Issues (20)
- Replace taxize backend HOT 1
- NCBI URIs HOT 5
- Cut new release to CRAN? HOT 5
- Thoughts on a hex? HOT 3
- Rmarkdown version of toplevel README no longer necessary? HOT 4
- General purpose accessor functions for nexml object inspection HOT 10
- Print summary doesn't deal properly with zero phylogenetic trees HOT 1
- Bug in splitting character matrix into continuous and discrete
- Adding characters fails for some matrices
- dplyr methods select_ and mutate_ are deprecated HOT 2
- New release with new summary() etc HOT 13
- get_characters() returns columns in different ordering than the list of char objects HOT 1
- Ability to drop objects from nexml object
- breaking change introduced in R 4.0.0 HOT 1
- Message from CRAN HOT 5
- tests fail with new dplyr HOT 3
- Warning message: select_() is deprecated as of dplyr 0.7.0
- Add RNeXML_ prefix to tree (and other) classes HOT 13
- Compatibility with dplyr 1.1.0 HOT 6
- nexml.org is down, causing tests to fail HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rnexml.