Comments (20)
No worries on the bugs. I have many to take care of myself.
Right, makes sense on checking if trees are correct.
Okay, will get to this later today hopefully.
from rnexml.
Forget to mention this issue, three commits are relavant here ada8def , c949693 , and ced59fb
from rnexml.
rock on! These look great.
Are they all passing for you? It appears that test_that
tests cannot use variables defined inside other tests, so I moved the shared initializations out above the test_that
calls in test_inheritance.R
, and now most of those pass for me (using test_dir("inst/tests/")
. (Just pushed these).
The first inheritance test still fails, somehow I'm not writing xsi:type
out to the meta
node correctly, so that's something I have to fix.
from rnexml.
Woops, thanks for fixing that.
Right, that first inheritance test fails for me too. They aren't quite identical as you said.
from rnexml.
@schamberlain Thanks for writing the extra test cases. Would you be interested in running some testing of trying to parse a bunch of nexml files (e.g. https://github.com/rvosa/supertreebase/tree/master/data/treebase ) and seeing if we get any errors?
Similarly would be good to try and write a bunch of ape files into NeXML, if you have some lying around (or can grab from the nexus using the treebase api...)
Such tests probably would never become part of our standard unit tests, but it would be good to have run a few more different files through the pipes just to see. The unit tests all reuse the same handful of trees over and over...
from rnexml.
@cboettig Yep, will do both of those.
from rnexml.
working on this now...
from rnexml.
Hey @cboettig , the new tests I wrote in the inst/tests/conversions
file run from a separate set of files here https://github.com/ropensci/RNeXML_testfiles
The conversions
file does not run when you do test_package("RNeXML")
, so that's good.
Please do change the setup, or let me know what you want changed.
Also note that tests still fail on the serializing and top level API tests.
from rnexml.
Okay, tested parsing and conversion on all 3586 NeXML files provided in the supertreebase repo.
Results:
conversion failed: read failed: success
65 2 3519
Not too bad, as we are parsing 98% successfully, which I believe is higher than the success rate in parsing the nexus
files provided by treebase into R (should recompute those numbers). Still, I think there is no good reason why we should be failing on any of these files, so this needs investigation.
Need to understand what happened in the failing cases. (The 3586 count excludes the nexml files that won't parse xmlParse
, mostly because they are empty files). The 2 read fails therefore parse, but still trigger read.nexml
to fail:
> read_failed
[1] "S12760.xml" "S13111.xml"
- Need to investigate what is going wrong here (and then add this situation as a unit test case).
More problematic are the conversions to ape, where
> conversion_failed
[1] "S10327.xml" "S10334.xml" "S10366.xml" "S10436.xml" "S10464.xml"
[6] "S104.xml" "S10548.xml" "S10562.xml" "S10589.xml" "S10644.xml"
[11] "S10648.xml" "S10679.xml" "S10689.xml" "S10772.xml" "S10810.xml"
[16] "S10968.xml" "S10995.xml" "S11021.xml" "S11155.xml" "S11178.xml"
[21] "S11267.xml" "S11459.xml" "S11618.xml" "S11647.xml" "S11786.xml"
[26] "S11911.xml" "S12022.xml" "S12093.xml" "S12101.xml" "S12147.xml"
[31] "S12173.xml" "S12233.xml" "S12300.xml" "S12336.xml" "S12410.xml"
[36] "S12523.xml" "S12536.xml" "S12630.xml" "S12773.xml" "S12786.xml"
[41] "S12880.xml" "S13140.xml" "S13461.xml" "S13664.xml" "S13805.xml"
[46] "S2087.xml" "S235.xml" "S236.xml" "S237.xml" "S239.xml"
[51] "S240.xml" "S241.xml" "S242.xml" "S243.xml" "S244.xml"
[56] "S245.xml" "S246.xml" "S247.xml" "S248.xml" "S249.xml"
[61] "S250.xml" "S251.xml" "S252.xml" "S382.xml" "S696.xml"
- Need to figure out what's going on here
While 3586 is a decent sample size, all these nexml files were created by the same serializer, so they aren't quite independent samples. @hlapp @rvosa Is there any other large collection of nexml files, or other popular serializer of nexml that we should be including as a test case?
Further, this test is not particularly deep, as it checks only that conversions run without error. Should obviously include more validation. @schamberlain updates to the test script are welcome -- test isn't run by test_that
at the moment since it needs the whole 2.5 Gigs of supertreebase data that obviously aren't in the R package, see comments at top of file.
from rnexml.
Good news on mostly successful parsing! Makes sense to do the test script the way you did it.
I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosa probably have some ideas
from rnexml.
On Oct 24, 2013, at 9:31 PM, Scott Chamberlain wrote:
I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosaprobably have some ideas
I'm not aware of any other sources.
from rnexml.
Okay, maybe we can create some ourselves @cboettig ?
from rnexml.
@schamberlain yeah. Not crucial perhaps but it would be good to get some sample NeXML files from each of the major tools that write NeXML (e.g. python, perl, ruby libs, though maybe @rvosa has a couple such nexml files lying around already). Actually I think there were a few from various sources in the Kseniia's inst/examples directory that I removed, https://github.com/shumelchyk/RNeXML/tree/0b6788e6b3f554d0f0598ca6a6499b3dcdb67859/tests/examples may as well check those.
Given that NeXML can be validated against its schema it probably doesn't matter so much what tool makes the NeXML. We know/check that we're writing valid nexml from R, and can assume/check that the input nexml files are valid, so the only weak link is in converting the nexml to ape, etc, as I discovered in the bug in #29.
from rnexml.
Fixing #29 reduced us from 65 conversion failures to 32.
- S10366.xml contains a single
<tree>
element which has onlymeta
element children - no nodes or edges. Guess we should throw an explicit error in this case, and suggest converting to / reading in as nexml instead ofnexmlTree
orphylo
to access metadata? - S10436.xml contains several
<tree>
elements, one of which has onlymeta
elements. In this case, we should be able to skip this tree in conversion tophylo
and parse the remaining trees into a multiPhylo list.
from rnexml.
Sorry to be late to the party, but aren't there files created by phenex that could be used for testing? Maybe @hlapp or @balhoff know?
from rnexml.
The NeXML files generated by Phenoscape are in the https://github.com/phenoscape/phenoscape-data repo.
from rnexml.
Specifically, look under "Curation Files". The only caveat I can think of is that some values for "id" attributes start with an integer, which is not valid for XML. Something I didn't realize earlier and need to go through and fix.
from rnexml.
Hi Jim! Thanks for the feedback, nice to see this tagging of user names
works so well. GitHub is good. Hope you're well, btw!
Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com
from rnexml.
@rvosa @balhoff Thanks! At this time we haven't implemented the character matrix side of NeXML (see #12), mostly because I'm unfamiliar with the R packages typically used in this context (I believe bioconductor has quite a few, but I'm mostly familiar only with the cran phylogenetics tools which largely work only with the phylogeny data...
Beyond just extracting the character matrix as an R matrix, not sure what the sensible thing to do is for the associated phenoscape metadata. Any example use case / work flows would be appreciated.
For the moment, we can parse the phenoscape file but RNeXML will drop the <characters>
node, since we don't even have an S4 version of that part of the schema implemented. Of course we can just keep it in R's XML representation, which is in some ways more useful since we can at least then navigate the tree with xpath expressions.
from rnexml.
We now have <characters>
implemented and a full test suite.
from rnexml.
Related Issues (20)
- Replace taxize backend HOT 1
- NCBI URIs HOT 5
- Cut new release to CRAN? HOT 5
- Thoughts on a hex? HOT 3
- Rmarkdown version of toplevel README no longer necessary? HOT 4
- General purpose accessor functions for nexml object inspection HOT 10
- Print summary doesn't deal properly with zero phylogenetic trees HOT 1
- Bug in splitting character matrix into continuous and discrete
- Adding characters fails for some matrices
- dplyr methods select_ and mutate_ are deprecated HOT 2
- New release with new summary() etc HOT 13
- get_characters() returns columns in different ordering than the list of char objects HOT 1
- Ability to drop objects from nexml object
- breaking change introduced in R 4.0.0 HOT 1
- Message from CRAN HOT 5
- tests fail with new dplyr HOT 3
- Warning message: select_() is deprecated as of dplyr 0.7.0
- Add RNeXML_ prefix to tree (and other) classes HOT 13
- Compatibility with dplyr 1.1.0 HOT 6
- nexml.org is down, causing tests to fail HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rnexml.