Comments (3)
So, I just wanted to make sure that we were handling this third case
appropriately. We could instead add a new otus block in this case with all
the otus for the characters, which would mean duplicate entries of certain
otus. I'm not sure just how problematic that would be?My sense is that it would be better to create a new otus node, with
possibly duplicate otus, than to automagically fold newly encountered
labels into an existing otus node without any user intervention. We can't
know ahead of time why people have an existing otus node (or multiples of
them) so it's not very polite to just start poking around in them. What
would be nice is to then, separately, have some sort of merge_by_name()
method that does the merging, perhaps with some additional intelligence
(e.g. ignore suffixes that look like accession numbers).Because a characters node must refer to a single otus node for reference
(I think), there's no point in checking for matches across multiple otussets, or writing only the unmatched otu labels into a separate otus node. I
assume it's no trouble to have more otu nodes in an otus block than any one
trees or characters block actually needs?
Yes, it's OK to have non-referenced otus blocks, as many as you like, and
non-referenced otu nodes in a block.
from rnexml.
@rvosa Thanks, that makes sense. Will modify the function such that we will still use an existing otus
block if it has all the taxa of interest (as we might often expect when adding a character matrix to a file containing a phylogeny), but will create an entirely new otus
block if any taxa in the character matrix are unmatched.
Obviously this same issue comes up when adding trees to an existing nexml file (I just hadn't written that method yet -- so far our methods only support creating a new nexml object/file given a tree). Naturally I will follow the same rule of thumb -- create a new otus
block if any of the taxa are unmatched.
Still, I cannot help feel that this may not produce the behavior the user expects in certain circumstances. Imagine a user adds a tree and then the corresponding character matrix to the same nexml file, and the matrix includes a few species not on the tree. This will result in two separate otus blocks with 99% the same contents, which is probably not the desired structure. In particular, I worry that this makes annotating the otu nodes difficult. It may also make it harder for a machine to recognize that the matrix and phylogeny cover the same otus (e.g. must explicitly check the matches, rather than just the otus attribute of each block.)
A user iteratively adding a set of characters and/or trees could wind up with lots of mostly duplicated otus blocks. Perhaps we must let the user toggle this behavior in the function call? We would still need a default option (or have a more confusing function).
from rnexml.
I think it's OK to ask for user intervention when integrating data. It's
how I do it in Bio::Phylo for the exact same use case - but I can also
appeal to a higher authority here: it's also how it's done in mesquite. If
we discover it's a frequent PITA (I doubt it) then it can always be changed.
from rnexml.
Related Issues (20)
- Replace taxize backend HOT 1
- NCBI URIs HOT 5
- Cut new release to CRAN? HOT 5
- Thoughts on a hex? HOT 3
- Rmarkdown version of toplevel README no longer necessary? HOT 4
- General purpose accessor functions for nexml object inspection HOT 10
- Print summary doesn't deal properly with zero phylogenetic trees HOT 1
- Bug in splitting character matrix into continuous and discrete
- Adding characters fails for some matrices
- dplyr methods select_ and mutate_ are deprecated HOT 2
- New release with new summary() etc HOT 13
- get_characters() returns columns in different ordering than the list of char objects HOT 1
- Ability to drop objects from nexml object
- breaking change introduced in R 4.0.0 HOT 1
- Message from CRAN HOT 5
- tests fail with new dplyr HOT 3
- Warning message: select_() is deprecated as of dplyr 0.7.0
- Add RNeXML_ prefix to tree (and other) classes HOT 13
- Compatibility with dplyr 1.1.0 HOT 6
- nexml.org is down, causing tests to fail HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rnexml.