Comments (5)
From this page it seems like ^ot:dataDeposit
is always going to by a url (for dryad or treebase or whatever), so maybe
get_study_external_data <- function(sm){
sm[["nexml"]][["^ot:dataDeposit"]][["@href"]]
}
Which works when there is a a link and returns NULL
otherwise -- don't know if NA is better in this case? There is probably also a better function name
from rotl.
It occurs to me, having the DOIs make it straightforward to find nucleotide sequences if they exist in NCBI. Takes a bit of messing around with characters but something like this:
sm <- get_study_meta("pg_1940")
sm_doi <- sm[["nexml"]][["^ot:studyPublication"]][["@href"]]
doi_term = paste0(sub("http://dx.doi.org/", "", sm_doi), "[doi]")
paper <- rentrez::entrez_search(db="pubmed", term=doi_term)
linked <- rentrez::entrez_link(dbfrom="pubmed", db="popset", id= paper$ids)
linked$links$pubmed_popset
[1] "295388201" "295388169" "295388107" "295388070" "295388035"
If we think this sort of thing is useful I'm happy to work integrating (at least) treebase
, fulltext
and rentrez into a get_external_data(ottid, data_source, ...)
sort of function?
from rotl.
Hi @dwinter,
Yes, it would be great if you could started with this!
I like the idea of a get_external_data
function.
There are a few methods to retrieve elements of the meta data associated with studies.
For instance, get_publication(get_study_meta("pg_1940"))
will return the string about the publication and the DOI as an attribute:
[1] "Van der Linde, K., Houle D., Spicer G.S., & Steppan S. 2010. A supermatrix-based molecular phylogeny of the family Drosophilidae. Genetics Research 92 (1): 25-38."
attr(,"DOI")
[1] "http://dx.doi.org/10.1017/S001667231000008X"
I am currently working on improving the interface with the studies_find_studies
and studies_find_trees
functions to return more information than just the identifiers so users can look into the studies before selecting them. I'll request feedback when I think I have something that works (probably later today).
from rotl.
Have finally made a start on this. In case anyone has comments about the design here's what I have for now. A function to fin external IDs:
links <- study_external_IDs("pg_1940")
links
External data identifiers for study pg_1940
$doi: 10.1017/S001667231000008X
$pubmed_id: 20433773
$popset_ids: vector of 5 IDs
$nucleotide_ids: vector of 164 IDs
$external_data_url http://purl.org/phylo/treebase/phylows/study/TB2:S10691
And a couple of functions to summarize the available data
summarize_nucleotide_data(links$nucleotide_ids[1:10])
uid
295388256 295388256
295388254 295388254
295388253 295388253
295388251 295388251
295388249 295388249
295388247 295388247
295388245 295388245
295388243 295388243
295388241 295388241
295388239 295388239
title
295388256 Hirtodrosophila thoracis cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388254 Hirtodrosophila duncani cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388253 Hirtodrosophila sp. KVDL-2010 cytochrome c oxidase subunit III-like (COIII) gene, partial sequence; mitochondrial
295388251 Mycodrosophila claytonae cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388249 Drosophila pinicola cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388247 Drosophila macrospina cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388245 Drosophila guttifera cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388243 Drosophila falleni cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388241 Zaprionus indianus cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388239 Zaprionus sepsoides cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
slen organism completeness
295388256 445 Hirtodrosophila thoracis
295388254 445 Hirtodrosophila duncani
295388253 363 Hirtodrosophila sp. KVDL-2010
295388251 445 Mycodrosophila claytonae
295388249 445 Drosophila pinicola
295388247 445 Drosophila macrospina
295388245 445 Drosophila guttifera
295388243 445 Drosophila falleni
295388241 445 Zaprionus indianus
295388239 445 Zaprionus sepsoides
from rotl.
That looks good!
My only comment looking quickly at the code is that it might be worth using httr::pase_url
function to get the doi instead of relying on a sub
e.g.
parse_url("http://dx.doi.org/10.1017/S001667231000008X")$path
from rotl.
Related Issues (20)
- Adding annotation to trees is not possible because of the lack of edge lengths HOT 4
- tol_induced_subtree drops supporting studies information
- Tips not ordered alphabetically for polytomies HOT 2
- tnrs_match_names bug HOT 8
- use score from tnrs match names
- Error when a species not matched with `tnrs_match_names()` HOT 1
- Error in tnrs_match_names() for unmatched names HOT 3
- Linux install HOT 2
- do_approximate_matching= TRUE not working on tnrs_match_names using multiple inputs HOT 2
- SSL certificate error HOT 2
- studies_find_trees returning NULL match_tree_ids HOT 1
- "The requested subtree is too large" -- possible to read in downloaded xml? HOT 2
- error with tnrs_match_names names list
- HTTP failure to connect to localhost HOT 1
- SSL certificate problem: certificate has expired HOT 1
- optimize dependencies
- `tnrs_match_names` does not return the best match in some cases HOT 4
- New bulk node_info and taxon_info api call options HOT 2
- [CRAN] package documentation issue
- SSL Certificate Issue HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rotl.