Giter VIP home page Giter VIP logo

Comments (20)

sckott avatar sckott commented on June 24, 2024

No worries on the bugs. I have many to take care of myself.

Right, makes sense on checking if trees are correct.

Okay, will get to this later today hopefully.

from rnexml.

sckott avatar sckott commented on June 24, 2024

Forget to mention this issue, three commits are relavant here ada8def , c949693 , and ced59fb

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

rock on! These look great.

Are they all passing for you? It appears that test_that tests cannot use variables defined inside other tests, so I moved the shared initializations out above the test_that calls in test_inheritance.R, and now most of those pass for me (using test_dir("inst/tests/"). (Just pushed these).

The first inheritance test still fails, somehow I'm not writing xsi:type out to the meta node correctly, so that's something I have to fix.

from rnexml.

sckott avatar sckott commented on June 24, 2024

Woops, thanks for fixing that.

Right, that first inheritance test fails for me too. They aren't quite identical as you said.

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

@schamberlain Thanks for writing the extra test cases. Would you be interested in running some testing of trying to parse a bunch of nexml files (e.g. https://github.com/rvosa/supertreebase/tree/master/data/treebase ) and seeing if we get any errors?

Similarly would be good to try and write a bunch of ape files into NeXML, if you have some lying around (or can grab from the nexus using the treebase api...)

Such tests probably would never become part of our standard unit tests, but it would be good to have run a few more different files through the pipes just to see. The unit tests all reuse the same handful of trees over and over...

from rnexml.

sckott avatar sckott commented on June 24, 2024

@cboettig Yep, will do both of those.

from rnexml.

sckott avatar sckott commented on June 24, 2024

working on this now...

from rnexml.

sckott avatar sckott commented on June 24, 2024

Hey @cboettig , the new tests I wrote in the inst/tests/conversions file run from a separate set of files here https://github.com/ropensci/RNeXML_testfiles

The conversions file does not run when you do test_package("RNeXML"), so that's good.

Please do change the setup, or let me know what you want changed.

Also note that tests still fail on the serializing and top level API tests.

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

Okay, tested parsing and conversion on all 3586 NeXML files provided in the supertreebase repo.

Results:

conversion failed:       read failed:            success 
                65                  2               3519 

Not too bad, as we are parsing 98% successfully, which I believe is higher than the success rate in parsing the nexus files provided by treebase into R (should recompute those numbers). Still, I think there is no good reason why we should be failing on any of these files, so this needs investigation.

Need to understand what happened in the failing cases. (The 3586 count excludes the nexml files that won't parse xmlParse, mostly because they are empty files). The 2 read fails therefore parse, but still trigger read.nexml to fail:

> read_failed
[1] "S12760.xml" "S13111.xml"
  • Need to investigate what is going wrong here (and then add this situation as a unit test case).

More problematic are the conversions to ape, where

> conversion_failed
 [1] "S10327.xml" "S10334.xml" "S10366.xml" "S10436.xml" "S10464.xml"
 [6] "S104.xml"   "S10548.xml" "S10562.xml" "S10589.xml" "S10644.xml"
[11] "S10648.xml" "S10679.xml" "S10689.xml" "S10772.xml" "S10810.xml"
[16] "S10968.xml" "S10995.xml" "S11021.xml" "S11155.xml" "S11178.xml"
[21] "S11267.xml" "S11459.xml" "S11618.xml" "S11647.xml" "S11786.xml"
[26] "S11911.xml" "S12022.xml" "S12093.xml" "S12101.xml" "S12147.xml"
[31] "S12173.xml" "S12233.xml" "S12300.xml" "S12336.xml" "S12410.xml"
[36] "S12523.xml" "S12536.xml" "S12630.xml" "S12773.xml" "S12786.xml"
[41] "S12880.xml" "S13140.xml" "S13461.xml" "S13664.xml" "S13805.xml"
[46] "S2087.xml"  "S235.xml"   "S236.xml"   "S237.xml"   "S239.xml"  
[51] "S240.xml"   "S241.xml"   "S242.xml"   "S243.xml"   "S244.xml"  
[56] "S245.xml"   "S246.xml"   "S247.xml"   "S248.xml"   "S249.xml"  
[61] "S250.xml"   "S251.xml"   "S252.xml"   "S382.xml"   "S696.xml" 
  • Need to figure out what's going on here

While 3586 is a decent sample size, all these nexml files were created by the same serializer, so they aren't quite independent samples. @hlapp @rvosa Is there any other large collection of nexml files, or other popular serializer of nexml that we should be including as a test case?

Further, this test is not particularly deep, as it checks only that conversions run without error. Should obviously include more validation. @schamberlain updates to the test script are welcome -- test isn't run by test_that at the moment since it needs the whole 2.5 Gigs of supertreebase data that obviously aren't in the R package, see comments at top of file.

from rnexml.

sckott avatar sckott commented on June 24, 2024

Good news on mostly successful parsing! Makes sense to do the test script the way you did it.

I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosa probably have some ideas

from rnexml.

hlapp avatar hlapp commented on June 24, 2024

On Oct 24, 2013, at 9:31 PM, Scott Chamberlain wrote:

I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosaprobably have some ideas

I'm not aware of any other sources.

from rnexml.

sckott avatar sckott commented on June 24, 2024

Okay, maybe we can create some ourselves @cboettig ?

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

@schamberlain yeah. Not crucial perhaps but it would be good to get some sample NeXML files from each of the major tools that write NeXML (e.g. python, perl, ruby libs, though maybe @rvosa has a couple such nexml files lying around already). Actually I think there were a few from various sources in the Kseniia's inst/examples directory that I removed, https://github.com/shumelchyk/RNeXML/tree/0b6788e6b3f554d0f0598ca6a6499b3dcdb67859/tests/examples may as well check those.

Given that NeXML can be validated against its schema it probably doesn't matter so much what tool makes the NeXML. We know/check that we're writing valid nexml from R, and can assume/check that the input nexml files are valid, so the only weak link is in converting the nexml to ape, etc, as I discovered in the bug in #29.

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

Fixing #29 reduced us from 65 conversion failures to 32.

  • S10366.xml contains a single <tree> element which has only meta element children - no nodes or edges. Guess we should throw an explicit error in this case, and suggest converting to / reading in as nexml instead of nexmlTree or phylo to access metadata?
  • S10436.xml contains several <tree> elements, one of which has only meta elements. In this case, we should be able to skip this tree in conversion to phylo and parse the remaining trees into a multiPhylo list.

from rnexml.

rvosa avatar rvosa commented on June 24, 2024

Sorry to be late to the party, but aren't there files created by phenex that could be used for testing? Maybe @hlapp or @balhoff know?

from rnexml.

hlapp avatar hlapp commented on June 24, 2024

The NeXML files generated by Phenoscape are in the https://github.com/phenoscape/phenoscape-data repo.

from rnexml.

balhoff avatar balhoff commented on June 24, 2024

Specifically, look under "Curation Files". The only caveat I can think of is that some values for "id" attributes start with an integer, which is not valid for XML. Something I didn't realize earlier and need to go through and fix.

from rnexml.

rvosa avatar rvosa commented on June 24, 2024

Hi Jim! Thanks for the feedback, nice to see this tagging of user names
works so well. GitHub is good. Hope you're well, btw!

Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

@rvosa @balhoff Thanks! At this time we haven't implemented the character matrix side of NeXML (see #12), mostly because I'm unfamiliar with the R packages typically used in this context (I believe bioconductor has quite a few, but I'm mostly familiar only with the cran phylogenetics tools which largely work only with the phylogeny data...

Beyond just extracting the character matrix as an R matrix, not sure what the sensible thing to do is for the associated phenoscape metadata. Any example use case / work flows would be appreciated.

For the moment, we can parse the phenoscape file but RNeXML will drop the <characters> node, since we don't even have an S4 version of that part of the schema implemented. Of course we can just keep it in R's XML representation, which is in some ways more useful since we can at least then navigate the tree with xpath expressions.

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

We now have <characters> implemented and a full test suite.

from rnexml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.