Giter VIP home page Giter VIP logo

Comments (5)

dwinter avatar dwinter commented on September 25, 2024

From this page it seems like ^ot:dataDeposit is always going to by a url (for dryad or treebase or whatever), so maybe

get_study_external_data <- function(sm){
   sm[["nexml"]][["^ot:dataDeposit"]][["@href"]]
}

Which works when there is a a link and returns NULL otherwise -- don't know if NA is better in this case? There is probably also a better function name

from rotl.

dwinter avatar dwinter commented on September 25, 2024

It occurs to me, having the DOIs make it straightforward to find nucleotide sequences if they exist in NCBI. Takes a bit of messing around with characters but something like this:

sm <- get_study_meta("pg_1940")
sm_doi <- sm[["nexml"]][["^ot:studyPublication"]][["@href"]]
doi_term = paste0(sub("http://dx.doi.org/", "", sm_doi), "[doi]")
paper <-  rentrez::entrez_search(db="pubmed", term=doi_term)
linked <- rentrez::entrez_link(dbfrom="pubmed", db="popset", id= paper$ids)
linked$links$pubmed_popset
[1] "295388201" "295388169" "295388107" "295388070" "295388035"

If we think this sort of thing is useful I'm happy to work integrating (at least) treebase, fulltext and rentrez into a get_external_data(ottid, data_source, ...) sort of function?

from rotl.

fmichonneau avatar fmichonneau commented on September 25, 2024

Hi @dwinter,

Yes, it would be great if you could started with this!

I like the idea of a get_external_data function.

There are a few methods to retrieve elements of the meta data associated with studies.

For instance, get_publication(get_study_meta("pg_1940")) will return the string about the publication and the DOI as an attribute:

[1] "Van der Linde, K., Houle D., Spicer G.S., & Steppan S. 2010. A supermatrix-based molecular phylogeny of the family Drosophilidae. Genetics Research 92 (1): 25-38."
attr(,"DOI")
[1] "http://dx.doi.org/10.1017/S001667231000008X"

I am currently working on improving the interface with the studies_find_studies and studies_find_trees functions to return more information than just the identifiers so users can look into the studies before selecting them. I'll request feedback when I think I have something that works (probably later today).

from rotl.

dwinter avatar dwinter commented on September 25, 2024

Have finally made a start on this. In case anyone has comments about the design here's what I have for now. A function to fin external IDs:

links <- study_external_IDs("pg_1940")
links
External data identifiers for study pg_1940 
 $doi:  10.1017/S001667231000008X 
 $pubmed_id:  20433773 
 $popset_ids: vector of 5 IDs 
 $nucleotide_ids: vector of 164 IDs
 $external_data_url http://purl.org/phylo/treebase/phylows/study/TB2:S10691

And a couple of functions to summarize the available data

summarize_nucleotide_data(links$nucleotide_ids[1:10])

                uid
295388256 295388256
295388254 295388254
295388253 295388253
295388251 295388251
295388249 295388249
295388247 295388247
295388245 295388245
295388243 295388243
295388241 295388241
295388239 295388239
                                                                                                                      title
295388256                Hirtodrosophila thoracis cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388254                 Hirtodrosophila duncani cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388253 Hirtodrosophila sp. KVDL-2010 cytochrome c oxidase subunit III-like (COIII) gene, partial sequence; mitochondrial
295388251                Mycodrosophila claytonae cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388249                     Drosophila pinicola cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388247                   Drosophila macrospina cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388245                    Drosophila guttifera cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388243                      Drosophila falleni cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388241                      Zaprionus indianus cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388239                     Zaprionus sepsoides cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
          slen                      organism completeness
295388256  445      Hirtodrosophila thoracis             
295388254  445       Hirtodrosophila duncani             
295388253  363 Hirtodrosophila sp. KVDL-2010             
295388251  445      Mycodrosophila claytonae             
295388249  445           Drosophila pinicola             
295388247  445         Drosophila macrospina             
295388245  445          Drosophila guttifera             
295388243  445            Drosophila falleni             
295388241  445            Zaprionus indianus             
295388239  445           Zaprionus sepsoides  

from rotl.

fmichonneau avatar fmichonneau commented on September 25, 2024

That looks good!

My only comment looking quickly at the code is that it might be worth using httr::pase_url function to get the doi instead of relying on a sub e.g.

parse_url("http://dx.doi.org/10.1017/S001667231000008X")$path

from rotl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.