Comments (4)
All your assumptions are correct. For example, when I convert a nexus file to nexml, the states (0/1) in the matrix end up as the values of the symbol attribute. For labels, you have to watch out a little bit to ensure that i) they are optional and not guaranteed to be unique (so don't use them as some sort of key or ID) ii) the label of the row or node should take precedence over the label of the otu, if both are available. You can imagine, for example, that the rows in an alignment have a label such as Genus_species_genbank_accession, while the otu is just Genus_species. We would like to retain both, in their relevant locations.
from rnexml.
@rvosa Great, thanks. I added labels the char
elements in your example comp_analysis.xml file, after which we can do:
f <- system.file("examples", "comp_analysis.xml", package="RNeXML")
nex <- read.nexml(f)
get_characters(nex)
and get back:
log snout-vent length reef-dwelling
taxon_8 -3.2777799 0
taxon_9 2.0959433 1
taxon_10 3.1373971 0
taxon_1 4.7532824 1
taxon_2 -2.7624146 0
taxon_3 2.1049413 0
taxon_4 -4.9504770 0
taxon_5 1.2714718 1
taxon_6 6.2593966 1
taxon_7 0.9099634 1
Note that
- the state symbols (
0
,1
) have been substituted for the state ids - the otu labels
taxon_n
have been substituted for the otu numbers on the row elements, (as rownames) - the char ids (colnames) have been substituted for the char labels.
As you point out, it sounds like we should be a bit careful when substituting labels. Currently, if no label is available, I keep the id number (otu id or char id) instead. Perhaps that behaviour is undesirable though, since it means the function is sometimes returning the char id and sometimes returning a label?
- Should we be using the label on the
row
itself (which I guess might readspecies_genus_accession
) instead of the label on the correspondingotu
for the rownames above? - Or should we always return both the
row
label and the correspondingotu
label as separate columns (even when one or both labels are absent or they are the same)? On one hand, this behavior seems the most consistent, but on the other, it's not likely to be convenient to most users (after all, we already have the extreme end of consistent but not convenient representation of the data as the S4 object).
@hlapp @balhoff Is my use of char
node's label
attribute consistent with the phenoscape expectation of where to find a 'human readable' column name? One might imagine that the columns should instead be labeled with abbreviation codes (e.g. LSVL
instead of log snout-vent length
, since spaces and dashes can be somewhat troublesome in column names), and then have the abbreviation defined in full elsewhere? This is the convention in the Ecological Metadata Language (EML); that columns are labeled by a short "attributeName" and defined by a longer "attributeDefinition" text string.
from rnexml.
@cboettig yes we use the label attribute for a human readable column name.
from rnexml.
On Nov 22, 2013, at 12:27 PM, Carl Boettiger wrote:
@hlapp @balhoff Is my use of char node's label attribute consistent with the phenoscape expectation of where to find a 'human readable' column name? One might imagine that the columns should instead be labeled with abbreviation codes (e.g. LSVLinstead of log snout-vent length, since spaces and dashes can be somewhat troublesome in column names), and then have the abbreviation defined in full elsewhere? This is the convention in the Ecological Metadata Language (EML); that columns are labeled by a short "attributeName" and defined by a longer "attributeDefinition" text string.
I don't think there's a convention stating that the char node label needs to be a short string or conversely can't be one. In Phenoscape this is just taken from the character matrix. Systematists don't ever call a character "LSVL" as far as I've seen. But that doesn't mean that other sources for matrices can't be using a short name that's expanded elsewhere.
If there is only one form, I'd suggest using rdfs:label. If there are two, a short form serving as column label and a long form giving a human-readable definition, I'd suggest to use refs:label for whatever is supposed to be the column label, and dc:description for whatever its explanation is.
from rnexml.
Related Issues (20)
- Replace taxize backend HOT 1
- NCBI URIs HOT 5
- Cut new release to CRAN? HOT 5
- Thoughts on a hex? HOT 3
- Rmarkdown version of toplevel README no longer necessary? HOT 4
- General purpose accessor functions for nexml object inspection HOT 10
- Print summary doesn't deal properly with zero phylogenetic trees HOT 1
- Bug in splitting character matrix into continuous and discrete
- Adding characters fails for some matrices
- dplyr methods select_ and mutate_ are deprecated HOT 2
- New release with new summary() etc HOT 13
- get_characters() returns columns in different ordering than the list of char objects HOT 1
- Ability to drop objects from nexml object
- breaking change introduced in R 4.0.0 HOT 1
- Message from CRAN HOT 5
- tests fail with new dplyr HOT 3
- Warning message: select_() is deprecated as of dplyr 0.7.0
- Add RNeXML_ prefix to tree (and other) classes HOT 13
- Compatibility with dplyr 1.1.0 HOT 6
- nexml.org is down, causing tests to fail HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rnexml.