ropensci / ramlegacy Goto Github PK

R :package: to download, cache and read in different versions of the RAM Legacy Stock Assessment Data Base, an online compilation of stock assessment results for commercially exploited marine populations from around the world.

Home Page: https://docs.ropensci.org/ramlegacy

License: Other

R 100.00%

r rstats r-package ropensci ramlegacy stock-assessment marine-biology fisheries

ramlegacy's Introduction

ramlegacy

Authors: Kshitiz Gupta, Carl Boettiger
License: MIT
Package source code on Github
Submit Bugs and feature requests

ramlegacy is an R package that supports caching and reading in different versions of the RAM Legacy Stock Assessment Data Base, an online compilation of stock assessment results for commercially exploited marine populations from around the world. More information about the database can be found here.

What does `ramlegacy` do?

Provides a function download_ramlegacy(), to download all the available versions of the RAM Legacy Stock Assessment Excel Database and cache them on the user’s computer as serialized RDS objects. This way once a version has been downloaded it doesn’t need to be re-downloaded for subsequent analysis.
Supports reading in specified tables or all tables from a cached version of the database through a function load_ramlegacy()
Provides a function ram_dir() to view the path of the location where the downloaded database was cached.

Installation

You can install the development version from Github with:

install.packages("devtools")
library(devtools)
install_github("ropensci/ramlegacy")

To ensure that the vignette is installed along with the package make sure to remove --no-build-vignettes from the build_opts in install_github

Usage

Please see the ramlegacy vignette for more detailed examples and additional package functionality.

Start by loading the package using library.

library(ramlegacy)

download_ramlegacy

download_ramlegacy() downloads the specified version of RAM Legacy Stock Assessment Excel Database and then saves it as an RDS object in user’s application data directory as detected by the rappdirs package. This location is also where load_ramlegacy() by default will look for the downloaded database.

# downloads version 4.44
download_ramlegacy(version = "4.44")

If version is not specified then download_ramlegacy defaults to downloading current latest version (4.44) :

# downloads current latest version 4.44
download_ramlegacy()

The latest versions of the RAM Legacy Database are archived in Zenodo but the older versions (v4.3, v3.0, v2.5, v2.0, v1.0) are not. To ensure access to these older versions of the database download_ramlegacy supports downloading them from this Github repository:

# downloads older version 4.3
download_ramlegacy(version = "4.3")

load_ramlegacy

After the specified version of the database has been downloaded and cached on your local machine through download_ramlegacy you can call load_ramlegacy to obtain a list of specific tables/all the tables from that version of the database. If version is not specified but tables is then load_ramlegacy defaults to returning a list containing the specified dataframes from the latest version (currently 4.44). If both version and tables are not specified then load_ramlegacy defaults to returning a list containing all the dataframes in the latest version (currently 4.44)

# get a list containing area and bioparams tables from
# version 4.3 of the database
load_ramlegacy(version = "4.3", tables = c("area", "bioparams"))

# get a list containing area and bioparams tables from version 4.44
# of the database
load_ramlegacy(version = "4.44", tables = c("area", "bioparams"))

# if tables is specified but version is not then the function defaults
# to returning a list containing the specified tables from the current
# latest version 4.44
load_ramlegacy(tables = c("area", "bioparams"))

# since both tables and version are not specified the function returns
# a list containing all the tables from the current latest version 4.44
load_ramlegacy()

To learn more about the different tables present in the database, what the various acronyms mean and the different stock summaries accompanying the databases please see this page.

ram_dir

To view the exact path where a certain version of the database was downloaded and cached by download_ramlegacy you can run ram_dir(vers = 'version'), specifying the version number inside the function call:

# download version 4.44
download_ramlegacy(version = "4.44")

# view the location where version 4.44 of the database was
# downloaded and cached
ram_dir(vers = "4.44")

Similar Projects

ramlegacy Sean Anderson has a namesake package that appears to be a stalled project on Github (last updated 9 months ago). However, unlike this package which supports downloading and reading in the Excel version of the database, Sean Anderson’s project downloads the Microsoft Access version and converts it to a local sqlite3 database.
RAMlegacyr RAMlegacyr is an older package last updated in 2015. Similar to Sean Anderson’s project, the package seems to be an R interface for the Microsoft Access version of the RAM Legacy Stock Assessment Database and provides a set of functions using RPostgreSQL to connect to the database.

Citation

Current and older versions of the RAM Legacy Database are archived in Zenodo, each version with its own unique DOI. The suggested format for citing data is:

RAM Legacy Stock Assessment Database. 2018. Version 4.44-assessment-only. Released 2018-12-22. Accessed [Date accessed YYYY-MM-DD]. Retrieved from DOI:10.5281/zenodo.2542919.

The primary publication describing the RAM Legacy Stock Assessment Database, and suggested citation for general use is:

Ricard, D., Minto, C., Jensen, O.P. and Baum, J.K. (2012) Evaluating the knowledge base and status of commercially exploited marine species with the RAM Legacy Stock Assessment Database. Fish and Fisheries 13 (4) 380-398. DOI: 10.1111/j.1467-2979.2011.00435.x

Several publications have relied on the RAM Legacy Stock Assessment Database.

ramlegacy's People

Contributors

Stargazers

Watchers

Forkers

boshek kshtzgupta1 ghostsofhiroshima caitiereza yanvt

ramlegacy's Issues

download_ramlegacy should have an argument to overwrite

@kshtzgupta1 just noticed this in testing the package in class -- it's nice for any interactive function to also have an optional argument to prevent the interactive behavior. e.g. download_ramlegacy() should have an argument overwrite with default being "ask", but which can be set to TRUE or FALSE in the function call to avoid the interactive prompt when you don't want it.

Are there any other interactive steps we want to fix as well?

update urls

@kshtzgupta1 I think we still need to update the URLs in the DESCRIPTION and anywhere else in the documentation (pkgdown etc) to have the ropensci github address now.

New Maintainer Wanted :-)

Or new maintainer team. 😸

If you're interested, please comment in the issue.
For more info, see

download_ramlegacy() still prompts even when user sets overwrite = TRUE

download_ramlegacy(overwrite=TRUE) should result in the function just downloading a new file, but instead, it still asks if the user wants to overwrite it. The function should only be interactive if overwrite is set explicitly to "ask", since it doesn't make sense to ask a user whether to overwrite if the user has set overwrite already when calling the function.

This is a duplicate of #4 and important for use in Rmd notebooks, etc, which count as being in "interactive" sessions

rOpenSci review: Sam

@boshek Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.

Though this is likely to end up on CRAN, the current installation instructions don't automatically build the vignette so accessing the vignette locally fails. I'd just suggest remove the line:

The vignette can also be viewed by calling vignette(package = "ramlegacy")

Until the installation instruction are for the CRAN version. Alternatively, one could install by removing --no-build-vignettes
from the build_opts in install_github

Fixed that line!

Vignette(s)** demonstrating major functionality that runs successfully locally

Use quoting markdown for RAM legacy description

This line:
> https://github.com/kshtzgupta1/ramlegacy/blob/8c26a196cc573a03574ed084a7e6c9d0b951e7a2/vignettes/ramlegacy.Rmd#L37
I believe is a typo and should read library(ramlegacy)

Fixed!

Function Documentation: for all exported functions in R help

Note that this function does not support vectorization so please don't pass in a vector of version numbers

Consider revising the above since you are unable to do so anyways

Revised!

Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Consider adding either suggested reference or the maintainer of ramlegacy to the DESCRIPTION file in Authors@R

Consider adding a CODE OF CONDUCT and CONTRIBUTING file directly to the repo.

Added!

I think that you are adding too much onto the library call. Consider any loading of data should be left solely to the load_ramlegacy function. I think this constrains future development of the package as future may entail functionality outside the database. This is also outside what a user would expect from a typical library call. We are into stylistic territory here but I personally think objects should not be loaded with the library call.

Agreed! We have removed the loading behavior of library(ramlegacy) and all loading is now solely done
by load_ramlegacy

I wonder if load_ramlegacy should gain an argument where a user can specify a vector that controls which dataset are loaded. This might be a useful feature for production code, particularly a shiny app.

Excellent suggestion! We have added a dfs argument in load_ramlegacy to which the user can pass a vector of dataframes to load.

Consider not even exposing a raw_url nor ram_path argument here as described here:
https://github.com/kshtzgupta1/ramlegacy/blob/8c26a196cc573a03574ed084a7e6c9d0b951e7a2/R/download_ramlegacy.R#L23-L25
If a user is never meant to change it, then perhaps the argument could be omitted abstracting that possible confusion away from the user. What is the utility of those arguments if they are never meant to change?

Having ram_url as an argument helps us write tests in which download_ramlegacy deal with internet connection issues. Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path as an argument is useful in download_ramlegacy and load_ramlegacy because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.

I would strongly consider a NEWS.md file to track changes between versions. rOpenSci provides a great template

Added!

I think the package lacks some connection with the actual data. Right now a bunch of dataframes are loaded but a new user is expected to have prior knowledge of the ram database. I think this package could provide some path for users to learn about the data itself.

We will definitely be open to including more educating vignettes if the maintainers of the database wanted to do that and certainly be willing to link more informative documentation in the vignette but at the same time we want to avoid writing things that are out-of-date or not coming from the maintainers. Also, in our opinion educating a new user in the database might be a bit outside the scope of the package which was primarily to improve access to the data.

It isn't clear to me why you need different versions of the database. This functionality is highlighted quite prominently but it's utility is not clear.

We believe providing access to older versions can be really useful for users trying to reproduce older research papers and studies.

RE: the alternate download location on github. I would consider if you have acknowledged and licensed this data appropriately in your assets repo. I would contact the RAM maintainers directly and seek direction on this. Fully reproducing the data might cause some concern with those maintainers.

Prof. Boettiger was in touch with the RAM maintainers regarding that. While the maintainers have moved the latest versions (4.40, 4.41, 4.44) to Zenodo they still have to do the same for the older versions. So till that happens the package will have to use the github repo to make the older versions available to the users.

The docs directory, _pkgdown.yml and _config.yml all needed to be added to .Rbuildignore for R CMD check to pass on my machine.

Added to .Rbuildignore!

I am wondering if this is intentional behaviour. I think this is a carryover from readxl::read_excel but
given the class of the data object it prints either as a dataframe or as a tibble depending on whether tibble is loaded or not. One thing to consider is even just adding tibble to Suggests for nicer printing in a vignette

It was intentional. We wanted to give the user the option to choose whether they wanted the tables as tibbles or dataframe. If for some reason a user prefers data.frame to tibble then we certainly don't want to impose tibbles on them.

I get these errors when I run goodpractice::gp

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF
    version. This typically indicates Rd problems.
  ✖ fix this R CMD check ERROR: Re-running with no redirection of
    stdout/stderr. Hmm ... looks like a package You may want to clean
    up by 'rm -Rf
    C:/Users/salbers/AppData/Local/Temp/Rtmp4yG4PY/Rd2pdf43d827bd526'

We couldn't reproduce these on our machines. I think they might be specific to your machine and may be occurring because you don't have Tex installed.

I think some slight improvement could be made by running styler on the package though it is marginal.

Agreed! I will run it after all the changes have been approved.

I wonder if it would be useful to engage Sean Anderson to see if he would be willing to submit a pull request with his code that converts the Access database into an sqlite database. I think there would be an appetite for the sqlite option which could exist alongside the Excel version. His process requires an additional utility outside of R but maybe a R-only solution is available.

We are definitely open to working with Sean Anderson. I think Prof. Boettiger has pinged him.

rOpenSci review: Jamie

@jafflerbach Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.

Vignette(s) demonstrating major functionality that runs successfully locally
The Vignette is good except it does not explain how to install the package. You can only find installation instructions in the README

Fixed!

When running goodpractice::gp() I got the following output:

  ✖ write unit tests for all functions, and all package code in general. 69% of code lines are covered by test
    cases.

    R/download_ramlegacy.R:53:NA
    R/download_ramlegacy.R:58:NA
    R/download_ramlegacy.R:59:NA
    R/download_ramlegacy.R:68:NA
    R/download_ramlegacy.R:69:NA
    ... and 58 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
    problems.

We couldn't reproduce that warning on our machines. I think it might be specific to your machine and may be occurring because you don't have Tex installed.

That being said, I strongly encourage the developers of this package to work with Sean and the maintainers of the RAM legacy database to get this package functional.

I believe Prof. Boettiger has reached out to Sean regarding that.

I first tried download_ramlegacy() when the site was down (1/17/2019). I downloaded from the backup location on github. This worked well. A couple of hours later, the website was backup and when I tried to download the only version now available on the site (4.4) I got an error message

> download_ramlegacy(version = "4.4")
Error: Invalid version number. Available versions are 1.0, 2.0, 2.5, 3.0, 4.3,

This should be resolved now!

I can forsee some user issues with the ram_url() argument in download_ramlegacy():

download_ramlegacy(version = NULL, ram_path = NULL,
  ram_url = "https://depts.washington.edu/ramlegac/wordpress/databaseVersions")

First, it looks like there is a typo (ramlegac instead of ramlegacy) so if the download_ramlegacy() function isn't acting as the user would expect (whether it's their fault or not) they might see the URL in the Help document and manually try to fix the typo. This may be an actual typo, I don't know. Either way, if you don't want the user to ever touch it I would hardcode that ram_url into the function itself rather than having it as an argument.

It was actually ramlegac in the url. But that is no longer relevant since we are now passing in the zenodo doi url to ram_url. Having ram_url as an argument helps us test for intended behavior of download_ramlegacy in face of internet connection issues and network problems.

Also the path argument in load_ramlegacy() is again one that should not be touched by the user. I tried to mess with this path argument and when I entered an incorrect path I received this:

> load_ramlegacy(version = 4.3, path = "~/github")
★ Loading version 4.3 ...
Error in readRDS(path) : error reading from connection
In addition: Warning message:
In readRDS(path) : error reading the file

I suggest changing the function so that if something is entered in as a path argument, it returns a message reminding the user to not change the path argument. Or, remove this as an argument entirely.

Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path as an argument is useful in download_ramlegacy and load_ramlegacy because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.