Giter VIP home page Giter VIP logo

Comments (11)

rvosa avatar rvosa commented on June 24, 2024

Do we expect people to alter the DOM tree a lot (i.e. are their risks of clashes if we use a simpler scheme)? Otherwise maybe tag name + a counter is more concise?

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

@rvosa Yeah, I'm not sure -- still trying to wrap my head around this one. Most users will probably only use the top-level API for writing an ape::phylo tree or list of trees (ape:multiPhylo) to NeXML, in which case we can number them as we go. But the S4-Class-based interface we have so far also allows users to just coerce ape::phylo trees into the S4 RNeXML::tree class, which can then be inserted into NeXML later. Perhaps we don't want users doing that, but this means they could modularly build up the DOM and we then have to watch out for collisions.

Or in more concrete terms, I have this setAs("phylo", "tree" ...) subroutine for mapping phylo objects to the S4 object that mimics the schema. Since the phylo object doesn't have an ID, I either have to generate one at this time, or otherwise add the id when adding the tree to an existing or new nexml/trees object. Does that make sense?

In other news, the validator complains that UUIDs aren't valid id attributes:

... is not a valid value of the atomic type 'xs:ID'

from rnexml.

hlapp avatar hlapp commented on June 24, 2024

On Aug 14, 2013, at 10:59 PM, Carl Boettiger wrote:

In other news, the validator complains that UUIDs aren't valid id attributes:

That sounds like a validator bug.

from rnexml.

rvosa avatar rvosa commented on June 24, 2024

What do the UUIDs look like? The schema specifies that the type of @id is
xs:ID, which is a non-colonized name (NCName), so instance documents must
conform to the production rules of NCNames (probably most importantly:
start with a letter or an underscore). If they don't, I don't see how the
validator is at fault here.

On Thu, Aug 15, 2013 at 5:50 AM, Hilmar Lapp [email protected]:

On Aug 14, 2013, at 10:59 PM, Carl Boettiger wrote:

In other news, the validator complains that UUIDs aren't valid id
attributes:

That sounds like a validator bug.


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-22683684
.

Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

from rnexml.

hlapp avatar hlapp commented on June 24, 2024

On Aug 15, 2013, at 5:44 AM, Rutger Vos wrote:

What do the UUIDs look like? [...] instance documents must conform to the production rules of NCNames (probably most importantly: start with a letter or an underscore). If they don't, I don't see how the
validator is at fault here.

UUIDs can start with a digit.

@cboettig: I suggest that if you choose UUIDs, you put them in the form of a urn:uuid: scheme. See http://www.ietf.org/rfc/rfc4122.txt

from rnexml.

rvosa avatar rvosa commented on June 24, 2024

IDs need to be non-colonized names, i.e. strings without colons. If I
understand your suggestion correctly, the UUIDs would contain colons, which
would be a no-no.

On Thu, Aug 15, 2013 at 4:16 PM, Hilmar Lapp [email protected]:

On Aug 15, 2013, at 5:44 AM, Rutger Vos wrote:

What do the UUIDs look like? [...] instance documents must conform to
the production rules of NCNames (probably most importantly: start with a
letter or an underscore). If they don't, I don't see how the
validator is at fault here.

UUIDs can start with a digit.

@cboettig: I suggest that if you choose UUIDs, you put them in the form of
a urn:uuid: scheme. See http://www.ietf.org/rfc/rfc4122.txt


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-22705449
.

Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

from rnexml.

hlapp avatar hlapp commented on June 24, 2024

On Aug 16, 2013, at 6:27 AM, Rutger Vos wrote:

IDs need to be non-colonized names, i.e. strings without colons. If I
understand your suggestion correctly, the UUIDs would contain colons, which
would be a no-no.

So HTTP URIs can't be IDs?

from rnexml.

rvosa avatar rvosa commented on June 24, 2024

Not normally. However, IDs can become part of HTTP URIs when transforming documents to RDF as they are then made globally unique by prefixing them with either the location of the document or the value of xml:base of the nearest ancestor node that contains this attribute. (Note that I didn't just make this up or anything.)

I see where you're going with this line of questioning. If we want HTTP URIs as IDs (good id(ea)), use xml:base.

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

I was just using the uuid package, which generates uuids that look like:

> UUIDgenerate()
[1] "f7af80aa-dfb2-4134-aa82-db1c0e9e7980"

No colons, so I'm not sure why the validator (accessed with the R wrapper to xmllib2) is unhappy.

Regardless, not sure uuids were a good idea for this purpose anyhow. The current workflow doesn't give the user the same flexibility over the DOM directly, so we probably don't have to worry about a user creating two S4 "tree" objects and then sticking them in the same nexml with duplicated IDs.

Instead, there is a method for phylo->nexml that creates the ids for otus as t1, t2..., nodes as n1, n2..., edges as e1, e2... etc (done). A separate method for multiPhylo -> nexml will allow the user to add multiple trees while avoiding id conflicts (not written yet). With a sensible top-level API I think we should be fine using these simple ids(?)

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

Okay, I think we're happy with our only locally unique ids for the moment. (Though still unsure what was wrong with the uuid above according to the validator...). Anyway, closing this issue.

from rnexml.

cboettig avatar cboettig commented on June 24, 2024

It appears that strings starting with a number were not valid ids (and uuids often start with numbers).

To address this, all functions that assign ids use the internal method nexml_id(), which can create local numbers using a given character prefix; e.g. edges use "nexml_id("e") to get ids like e1, e2, etc, using an internal counter. The counters start at 1 and increase each time the id of a given prefix is used in that R session, unless reset with reset_id_counter(). This local counter scheme is used by default.

The command options(uuid=TRUE) will make RNeXML use uuids for all id attributes instead. To avoid the validation error, these are prepended with uuid-. This option can be issued per session or put in the user's .Rprofile as persistent configuration. options(uuid=FALSE) sets the behavior back to the local identifiers.

test_global_ids.R provides a unit test that we generate valid nexml when using the global (uuid) id scheme.

from rnexml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.