Giter VIP home page Giter VIP logo

rdryad's Introduction

rdryad

Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.* R-check codecov cran checks rstudio mirror downloads cran version

rdryad is a package to interface with the Dryad data repository.

*This package will be superceded by {deposits}. See Issue #39

General Dryad API documentation: https://datadryad.org/api/v2/docs/

rdryad docs: https://docs.ropensci.org/rdryad/

Installation

Install Dryad from CRAN

install.packages("rdryad")

development version:

remotes::install_github("ropensci/rdryad")
library('rdryad')

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for rdryad in R doing citation(package = 'rdryad')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ropensci_footer

Data provided by...

Data is provided from the Dryad API.

rdryad's People

Contributors

aammd avatar alrutten avatar cboettig avatar karthik avatar mja avatar mpadge avatar noamross avatar sckott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

rdryad's Issues

No corresponding file found

Hi,
First, thanks for building rdryad =) it's nice to have a package to directly manage Dryad repositories.

I stumbled upon an interesting dataset from Dryad the DOI of the article is 10.1111/ecog.01986 and I can find it on Dryad:

q = rdryad::d_solr_search(q = "10.1111/ecog.01986")
q$handle
# [1] "10255/dryad.116170"

I get the handle back but can't seem to download data from it:

rdryad::download_url("10255/dryad.116170")
# Error: No output from search

Looking at query all_ac field, there is another handle mentioned 10255/dryad.116171

rdryad::download_url("10255/dryad.116171")
# [1] "http://datadryad.org/bitstream/handle/10255/dryad.116171/DryadArchive.zip?sequence=1"

This one works. Do you know why? Is it proper to Dryad's architecture? Is there a way to programmatically extract this second handle?

Change for rdryad2

keep

get a download url

download_url

OAI-PMH functions

download_dryadmetadata -> dr_get_records
dr_identify -> same
dr_listidentifiers -> same
dr_listmetadataformats -> same
dr_listsets -> same

download a file, return path, simple

dryad_getfile

Solr interface functions

d_solr_facet
d_solr_group
d_solr_highlight
d_solr_mlt
d_solr_search
d_solr_stats

remove

getalldryad_metadata
search_dryad

Changes in Dryad's ZIP download feature

Dryad has recently changed our internal process for generating ZIP files that allow download of an entire dataset.

As a result of these changes, the vast majority of downloads will start faster. However, for larger datasets (currently above 200MB), a single-ZIP download will not be available. In these cases, dryad_download will not work, and users should be directed to use dryad_files_download instead.

Uploading to Dryad

Hi,

I would like to upload files to Dryad using RDryad, do you think that is going to be possible in a near future?

Thank you.

Session Info

readme xhtml

From kurt hornik

These have README.md files which when converted to (X)HTML using a
current version of pandoc show problems when validated using W3C Markup
Validator, see below.

Most of these problems are caused by using images without giving a name
(so the required alt attribute for <img> is not provided), or using <br>
instead of <br/>.

Pls fix these problems in your README.md files for your next release: in
all cases I inspected, the fixes were obvious and confirmation using
pandoc and W3C markup validator seemed unnecessary.

Please also visit your package check web page at http://cran.r-project.org/web/checks/check_results_PACKAGENAME.html to see if other problems need to be addressed as well.

REadme fixes

These packages contain README.md files with invalid HTML output created
by pandoc 1.12.4.2 according to W3C-validator.

I attach the HTML errors and warnings found below, and will put copies
of the corresponding HTML files up at
http://www.r-project.org/nosvn/pandoc.

Please investigate the problems and fix as needed.

Afaics, many of the problems are caused by adding "raw" HTML elements in
the README.md files and not realizing that the default output format
"html" is XHTML 1 (and not HTML 5). E.g., a raw
results in an

end tag for "br" omitted, but OMITTAG NO was specified

error.

Best
-k

rdryad.html:
  Valid: FALSE (errors: 1, warnings: 0)
  Errors:
    line  col  message
      43   14  there is no attribute "border"

Name conflict

identify is a function in base graphics. Can we rename it in the next version to avoid masking a base function?

replace ReadImages with some other package

Message from Brian Ripley:

"ReadImages has been orphaned and will be archived shortly: the 'maintainer' never updated it for R 2.14.0 and never gave credit for work he included.

Packages:

Histdata ImageMetrics Momocs RXKCD RcmdrPlugin.SCDA SCVA geomorph rdryad

in theory make use of it (only HistData does in its checks). Please make alternative arrangements (e.g. read.jpeg can be replaced by readJPEG in package jpeg) by the end of January."

Can't pass arguments to dryad_datasets

All these calls return the same results:

dryad_datasets()
dryad_datasets(per_page = 25, page = 2)
dryad_datasets(per_page = 200)

This means that the page and per_page (Dryad API docs) are not getting passed on. This should be a simple fix according to @mpadge

image

Unable to download files from dryad

Hi, I'm trying to download individual files from a published dataset.

From the linked dryad website, I copied file ids for files of interest, however I encountered problems while trying to download them.

One file failed completely:

> dryad_files_download(33893)
Error in file(file): invalid 'description' argument
Traceback:

1. dryad_files_download(33893)
2. Map(function(x, y) each_files_download(x, y, ...), ids, paths)
3. mapply(FUN = f, ..., SIMPLIFY = FALSE)
4. (function (x, y) 
 . each_files_download(x, y, ...))(dots[[1L]][[1L]], dots[[2L]][[1L]])
5. each_files_download(x, y, ...)
6. file(file)

Another file successfully downloaded, but it's content is missing:

> dryad_files_download(33892)
[[1]]
[1] "/home/jena/.cache/R/rdryad/33892.docx"
$ ls -l /home/jena/.cache/R/rdryad/33892.docx
-rw-r--r-- 1 jena jena 6 pro 14 12:32 /home/jena/.cache/R/rdryad/33892.docx
$ head /home/jena/.cache/R/rdryad/33892.docx
PK���

Here is a screenshot showing some extra characters after the "PK":
Snímek z 2020-12-14 12-48-34

Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: elementary OS 5.1.7 Hera

Matrix products: default
BLAS/LAPACK: /home/jena/miniconda3/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=cs_CZ.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=cs_CZ.UTF-8        LC_COLLATE=cs_CZ.UTF-8    
 [5] LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=cs_CZ.UTF-8   
 [7] LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rdryad_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5      magrittr_2.0.1  rappdirs_0.3.1  uuid_0.1-4     
 [5] R6_2.5.0        rlang_0.4.8     hoardr_0.5.2    tools_3.6.1    
 [9] htmltools_0.5.0 ellipsis_0.3.1  digest_0.6.27   httpcode_0.3.0 
[13] tibble_3.0.4    lifecycle_0.2.0 crayon_1.3.4    zip_2.1.1      
[17] IRdisplay_0.7.0 repr_1.1.0      base64enc_0.1-3 vctrs_0.3.5    
[21] triebeard_0.3.0 IRkernel_1.1.1  curl_4.3        crul_1.0.0     
[25] evaluate_0.14   mime_0.9        pbdZMQ_0.3-3.1  compiler_3.6.1 
[29] pillar_1.4.7    urltools_1.7.3  jsonlite_1.7.1  pkgconfig_2.0.3

Suggestion: Progress Counter

Just a minor suggestion. It would be nice to have some kind of "progress counter" output on the console. I just downloaded a very large dataset with rdryad::dryad_download() and it took a long time to complete. I got myself thinking that maybe I was having connection issues or my R had broke. A percentile counter could help me to check whether the download had stopped or not.

rdryad::dryad_download() cannot be used in a CRAN package

Hello!
It seems that one cannot use rdryad::dryad_download() in a R package. What happens is that rdryad::dryad_download() can only download to ~/Library/Caches/R . I used this in the vignette to my mvSLOUCH R package, however I obtained from CRAN:
The CRAN policy only allows writing in file areas via tools::R_user_dir() (which differ by OS). On macOS, ~/Library/Caches/R is not one of those, yet we see there
drwxr-xr-x 4 ripley staff 128 8 Nov 06:54 rdryad/
from
mvSLOUCH via rdryad (which does not create this in its own checks).
This makes it much harder to sweep up after you.

I would suggest adding the possibility to specify a target download destination for rdryad::dryad_download().

Best wishes
Krzysztof Bartoszek

How to get files' ids?

Hi, sorry for stupid question, but I don't know how to get files' ids so I can download individual files from a dryad dataset.

I tried looking at our published dataset with:

> dryad_dataset("10.5061/dryad.7nt8f")
# truncated output
$`10.5061/dryad.7nt8f`$id
[1] 6817

However if I try to use that id to get files, it shows different doi for this id:

> dryad_files(6817)
# truncated output
$`6817`$`_links`$`stash:dataset`$href
[1] "/api/v2/datasets/doi%3A10.5061%2Fdryad.nf757"

i.e. the returned doi is rather 10.5061/dryad.nf757 instead of 10.5061/dryad.7nt8f.

So how do I get:

  • a proper ids for my dataset, to be used in functions like dryad_files?
  • a link to a particular file (e.g. Appendix S2.txt in the doi link above)?
Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: elementary OS 5.1.7 Hera

Matrix products: default
BLAS/LAPACK: /home/jena/miniconda3/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=cs_CZ.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=cs_CZ.UTF-8        LC_COLLATE=cs_CZ.UTF-8    
 [5] LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=cs_CZ.UTF-8   
 [7] LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rdryad_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5      magrittr_2.0.1  rappdirs_0.3.1  uuid_0.1-4     
 [5] R6_2.5.0        rlang_0.4.8     hoardr_0.5.2    tools_3.6.1    
 [9] htmltools_0.5.0 ellipsis_0.3.1  digest_0.6.27   httpcode_0.3.0 
[13] tibble_3.0.4    lifecycle_0.2.0 crayon_1.3.4    zip_2.1.1      
[17] IRdisplay_0.7.0 repr_1.1.0      base64enc_0.1-3 vctrs_0.3.5    
[21] triebeard_0.3.0 IRkernel_1.1.1  curl_4.3        crul_1.0.0     
[25] evaluate_0.14   mime_0.9        pbdZMQ_0.3-3.1  compiler_3.6.1 
[29] pillar_1.4.7    urltools_1.7.3  jsonlite_1.7.1  pkgconfig_2.0.3

really hard to install rdryad package in ubuntu 12.04

cause it depends on libcurl and may other packages compiled by other languages, which the install.packages() function will not automatically download for you. better to give some information on this like the RCURL packages given(systems requirement in http://cran.r-project.org/web/packages/RCurl/index.html) . like the following:
""
libcurl (version 7.14.0 or higher) http://curl.haxx.se. On Linux systems, you will often have to explicitly install libcurl-devel to have the header files and the libcurl library.
"

Ryan at Dryad says

http://wiki.datadryad.org/External_Metadata_Use#ROpenSci

Appearance:
This is a rich command-line tool. It is possible to extract any metadata available for any data package on the site that is available through OAI-PMH.

Potential Problems:
When querying for total data packages and total data files, the numbers that are returned are substantially higher than what appears on the homepage for datadryad.org. Ryan guesses this is because the number from R includes datasets that have been deleted, and R is failing to account for those that have been tagged deleted.

Recommendations:
We should communicate the above problem to the developers of ROpenSci.

stringi dependency

When I execute library(rdryad) after installation I receive the following error message:

Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called ‘stringi’
Error: package or namespace load failed for ‘rdryad’

Works fine after I run install.packages("stringi").

Updating a metadata download

I wrote a new function called updatealldryad_metadata. The idea is that if you already have a downloaded dryadmetadata.csv, you can run update to get newer records and not wait forever. The function will let you overwrite the file or create a new one. If you don't specify a new filename, it just appends date_time to the current filename.

Issue: It seems like there is no way to just get the identifiers and do a diff. This is likely due to the fact that in the getalldryad_metadata function, it downloads everything, then removes records with no metadata. This seems to throw off the diff. I will work on this again over the weekend but if you guys have any quick fixes, that would be great.

Error: Internal Server Error (HTTP 500)

> dryad_download(dois = "10.5061/dryad.f385721n")
Error: Internal Server Error (HTTP 500)

Any reason why the example code is not working?

> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Ventura 13.6.1
 system   x86_64, darwin20
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Detroit
 date     2024-02-01
 rstudio  2023.12.1+402 Ocean Storm (desktop)
 pandoc   NAPackages ───────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
 callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
 crul          1.4.0   2023-05-17 [1] CRAN (R 4.3.0)
 curl          5.2.0   2023-12-08 [1] CRAN (R 4.3.0)
 devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
 digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
 fansi         1.0.5   2023-10-08 [1] CRAN (R 4.3.0)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
 glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 hoardr        0.5.4   2024-01-23 [1] CRAN (R 4.3.2)
 htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.0)
 htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.3.0)
 httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.3.0)
 httpuv        1.6.14  2024-01-26 [1] CRAN (R 4.3.2)
 jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
 later         1.3.2   2023-12-06 [1] CRAN (R 4.3.0)
 lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
 mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
 pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.3.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 pkgload       1.3.3   2023-09-22 [1] CRAN (R 4.3.0)
 prettyunits   1.2.0   2023-09-24 [1] CRAN (R 4.3.0)
 processx      3.8.2   2023-06-30 [1] CRAN (R 4.3.0)
 profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
 promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.0)
 ps            1.7.5   2023-04-18 [1] CRAN (R 4.3.0)
 purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 rappdirs      0.3.3   2021-01-31 [1] CRAN (R 4.3.0)
 Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.0)
 rdryad      * 1.0.0   2020-06-25 [1] CRAN (R 4.3.0)
 remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.2)
 rlang         1.1.2   2023-11-04 [1] CRAN (R 4.3.0)
 rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 shiny         1.8.0   2023-11-17 [1] CRAN (R 4.3.0)
 stringi       1.8.3   2023-12-11 [1] CRAN (R 4.3.0)
 stringr       1.5.1   2023-11-14 [1] CRAN (R 4.3.0)
 tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 triebeard     0.4.1   2023-03-04 [1] CRAN (R 4.3.0)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
 urltools      1.7.3   2019-04-14 [1] CRAN (R 4.3.0)
 usethis       2.2.2   2023-07-06 [1] CRAN (R 4.3.0)
 utf8          1.2.4   2023-10-22 [1] CRAN (R 4.3.0)
 vctrs         0.6.4   2023-10-12 [1] CRAN (R 4.3.0)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)
 zip           2.3.1   2024-01-27 [1] CRAN (R 4.3.2)

 [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library

──────────────────────────────────────────────────────────────────────

Errors in example.R

This works:

 dryaddat <- download_url("10255/dryad.1759")
dat <- read.csv(dryaddat)

but then this fails:

dat <- read.csv(dryaddat, ";") # This file happens to be ; delimited instead.
Error in !header : invalid argument type

This fails;

# Get all OAIs
alldryadoais <- get_dryadoais()
Error: could not find function "get_dryadoais"

This fails:

metadat <- download_dryadmetadata("10255/dryad.1759", TRUE)
Error in OAI_PMH_issue_request(baseurl, request) : 
  Received condition 'idDoesNotExist' with diagnostic:
"10255/dryad.1759" is unknown or illegal in this repository

Causing examples using metadat to fail

Archive Package

@mpadge could you please add yourself to DESCRIPTION as I was told you are the new maintainer? 😉

`dryad_fetch`: may need to rethink which URLs to use to fetch files

the urls like http://api.datadryad.org/mn/object/doi:xxx sometimes work and sometimes don't . e.g. ,

http://api.datadryad.org/mn/object/doi:10.5061/dryad.1758/1/bitstream

used to work, but now doesn't. but you can get it by doing http://datadryad.org/bitstream/handle/10255/dryad.1759/dataset.csv?sequence=1 which I think can get through dc metadata on the landing page for the DOI maybe http://datadryad.org/resource/doi:10.5061/dryad.1758

Solr!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.