Giter VIP home page Giter VIP logo

nbar's Introduction

nbaR

Project Status: Unsupported peer-review at rOpenSci

This package has been archived. The former README is now in README-not.

nbar's People

Contributors

hettling avatar rvosa avatar jeroen avatar maelle avatar kevcaz avatar

Stargazers

Andreas Motl avatar Jeroen Van Goey avatar Tobias Augspurger avatar

Watchers

 avatar Andreas Motl avatar James Cloos avatar  avatar  avatar Maarten Schermer avatar  avatar  avatar  avatar Jaap Pluimgraaff avatar gbbio avatar  avatar

Forkers

mbjoseph kevcaz

nbar's Issues

Schematic?

It might be nice in the getting started vignette to provide a schematic of the query steps to highlight the commonalities (e.g. $new, $resultset … etc.)


# specify data service
client <- DATASERVICE$new()
# create query conditions
condition1 <- QueryCondition$new()
condition2 <- QueryCondition$new()
...
# specify query
query <- QuerySpec$new(conditions(list(condition1, condition2 ...))
# search
res <- res$query(querySpec = query)

duplicated code

The code has a lot of repetition, and is not particularly human readable presumably because it is generated automatically from the API. I'm not sure how to evaluate this part of the package: on the one hand, it is awesome that the code is programmatically generated to deal with future changes in the API, but on the other hand, there are repetitive blocks of code like the following, which itself is repeated twice in the definition of the Specimen class:

self[["sourceSystemId"]] <-
  SpecimenList[["sourceSystemId"]]
self[["recordURI"]] <-
  SpecimenList[["recordURI"]]
self[["id"]] <-
  SpecimenList[["id"]]
self[["unitID"]] <-
  SpecimenList[["unitID"]]
self[["unitGUID"]] <-
  SpecimenList[["unitGUID"]]
self[["collectorsFieldNumber"]] <-
  SpecimenList[["collectorsFieldNumber"]]
self[["assemblageID"]] <-
  SpecimenList[["assemblageID"]]
self[["sourceInstitutionID"]] <-
  SpecimenList[["sourceInstitutionID"]]
self[["sourceID"]] <-
  SpecimenList[["sourceID"]]
self[["previousSourceID"]] <-
  SpecimenList[["previousSourceID"]]
self[["owner"]] <-
  SpecimenList[["owner"]]
self[["licenseType"]] <-
  SpecimenList[["licenseType"]]
self[["license"]] <-
  SpecimenList[["license"]]
self[["recordBasis"]] <-
  SpecimenList[["recordBasis"]]
self[["kindOfUnit"]] <-
  SpecimenList[["kindOfUnit"]]
self[["collectionType"]] <-
  SpecimenList[["collectionType"]]
self[["sex"]] <-
  SpecimenList[["sex"]]
self[["phaseOrStage"]] <-
  SpecimenList[["phaseOrStage"]]
self[["title"]] <-
  SpecimenList[["title"]]
self[["notes"]] <-
  SpecimenList[["notes"]]
self[["preparationType"]] <-
  SpecimenList[["preparationType"]]
self[["previousUnitsText"]] <-
  SpecimenList[["previousUnitsText"]]
self[["numberOfSpecimen"]] <-
  SpecimenList[["numberOfSpecimen"]]
self[["fromCaptivity"]] <-
  SpecimenList[["fromCaptivity"]]
self[["objectPublic"]] <-
  SpecimenList[["objectPublic"]]
self[["multiMediaPublic"]] <-
  SpecimenList[["multiMediaPublic"]]

One reason to prefer DRY code is to avoid making the same change over and over, but if a machine is tasked with making those changes, maybe DRY matters less? (As an aside: I was somewhat unsure in reviewing whether my suggestions relate more to swagger or nbaR!) The structure of the code might make it difficult for users to glean much information by reading the source code, relative to some other packages. From my perspective this could be fine, as long as there is good documentation (for humans) that gives users the information they need.

Pkgdown issues

Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`Element\``
Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`GeoAreaClient\``
Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`Element\``
Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`GeoAreaClient\``
Warning: Topics missing from index: chronos_calib, geo_age
These above errors can be fixed by updating the pkgdown yaml. (Unless you are using tic to auto-generate the yaml?)

Additionally, it might be nice to organise the functions by utility to a user (e.g. SpecimentClient, TaxonClient should take precedence over Agent). In packages I have written in the past I have organised the functions/classes by public and private categories.

Also, Changelog leads to a 404.

Specimen class docs

Further fields include (among others) information about Finding place, identification, and multimedia content.

It seems like "Finding" should not be capitalized (but maybe this is a swagger issue?)

'application note' vignette

Draft a vignette (±2-3 pages) that outlines the main functionality as relevant to a scientific end user: what data can I get and how do I get it?

Package vignette

Make package vignette describing workflows currently possible

"advanced" data munging examples

Seeing that the data structures that one gets back are somewhat complex, it would be nice to see a few examples to show how to extract useful data from queries. For example: how to make a data frame with useful information coming out of a series of queries (like, a data frame with fossil calibration points for primate taxa).

Named list

Named lists in query conditions do not work?

library(nbaR)
sc <- SpecimenClient$new()
field <- 'identifications.defaultClassification.genus'
qc <- QueryCondition$new(field = field, operator = 'EQUALS', value = 'Solanum')
# success
qs <- QuerySpec$new(conditions = list(qc))
res <- sc$query(querySpec = qs)
res$content$totalSize
# fail
qc_list <- list('solanum' = qc)
qs <- QuerySpec$new(conditions = qc_list)
res <- sc$query(querySpec = qs)
res$content$totalSize

Test results

When I first downloaded the repo, I got no errors when testing the package. But since 8 Nov 2018, I started getting the error and warnings below. This may just be due to the NBA API itself, but is it possible for the package be able to note the difference between API failure and internal failure?

test-multimediaClient-metadata.R:36: warning: Settings work
Status code:500
Internal Server Error
Exception: Invalid NBA setting: 50000
Exception type: java.lang.IllegalArgumentException
Full stack trace stored in response object
test-specimenClient-find.R:57: warning: Test error handling in find functions
404 (NOT FOUND)
No Specimen exists with ID XXX
test-specimenClient-misc.R:90: failure: getDistinctValuesPerGroup works
res$content has length 3, not length 2.
test-specimenClient-query.R:149: warning: Errors and warnings work
Status code:500
Internal Server Error
Exception: Field associatedMultiMediaUris.accessUri cannot be queried: field is not indexed
Exception type: nl.naturalis.nba.api.InvalidConditionException
Full stack trace stored in response object
test-taxonClient-find.R:35: warning: Test error handling in find functions
404 (NOT FOUND)
No Specimen exists with ID XXX
test-taxonClient-query.R:80: warning: Errors and warnings work
Status code:500
Internal Server Error
Exception: Invalid element "somefield" in path "somefield"
Exception type: nl.naturalis.nba.api.InvalidConditionException
Full stack trace stored in response object
══ Results ════════════════════════════════════════
Duration: 85.4 s

OK:       1420
Failed:   1
Warnings: 8
Skipped:  2

Function wrappers

I have a thought that for the most basic of queries you could simply have wrapper functions, e.g. specimen_search or taxon_search, which would take just names/ids and return R data structures that the majority of R users would understand. These wrappers would likely lead to much less coding on the part of the end-user and, although less flexible, they should make the package much more accessible.

E.g. this below wrapper let's a user run of the examples from "getting started".

specimen_search <- function(value, field) {
  qcs <- vector(mode = 'list', length = length(value))
  for (i in seq_along(value)) {
    qcs[[i]] <- QueryCondition$new(field = field[[i]], operator = 'EQUALS',
                                   value = value[[i]])
  }
  qs <- QuerySpec$new(conditions = qcs)
  sc <- SpecimenClient$new()
  res <- sc$query(querySpec = qs, size = 100)
  lapply(X = res$content$resultSet, FUN = function(x) {
    x$item$toList()
  })
}

res <- specimen_search(value = c('female', 'species', "Equidae"),
                       field = c('sex', "identifications.taxonRank",
                                 "identifications.defaultClassification.family"))

Rd documentation for endpoint methods

At the moment, class methods are documented within the documentation of a class; e.g. ?Specimen would give all fields in that class and list all objects. It would be good to have separate documentation about a single methods, such as ?Specimen$query. Documentation is generated with roxygen, and at the moment, this does not seem to be possible for R6 classes...

Split main vignette

There is a lot of documentation already, and it was really critical for me to be able to start making more of my own queries. I think though I would have liked a little more organisation in the “getting started” article and more real-world examples. I think the shark vignette is great because it provides a real world use for the package, additional articles need not be so long though. In particular I think it would be great if the “Getting started” article were split up (e.g. into real basics and then independent articles for each data domain). I think a visual schematic showing the basic class structures would really help as well. Finally, I was interested in the operators (equals, not equals, contains….) but found these were not sufficiently explained. More examples would help.

vignette client variable

ook in het vignette, er staat eerst client <- SpecimenClient$new() en res <- client$query(), maar verderop gaat het verder met res <- sc$query(querySpec=qs) (dus sc ipv client). is denk ik handiger als dat consequent is.

Typos shark vignette

"Baeysian"
"Maximum-Likelihood" can be lowercase: "maximum likelihood" or "maximum-likelihood"
"For many specimen," should be "specimens"
"levels; If ...", capitalization of "If" is not necessary with the semicolon
"...there are sufficient specimen determined", should be plural "specimens"

documented unit tests

The best examples are working code samples, so it would be great if the unit tests could function in that way, i.e. they really exercise the code in realistic cases, with ample comments to talk the innocents through the code.

Explaining warnings

https://github.com/naturalis/nbaR/blob/master/vignettes/sharks.Rmd#L190

This command raised two warnings:

times <- geo_age(unique(c(as.character(data$youngChronoName), as.character(data$oldChronoName))))
Warning messages:
1: In FUN(X[[i]], ...) :
Could not retreive values for geo unit "Mioceen" from earthlifeconsortium.org
2: In FUN(X[[i]], ...) :
Could not retreive values for geo unit "Unspecified age" from earthlifeconsortium.org
It would be nice to have some text here for users to explain what these warnings are, and whether they matter.

Spell check

I highlight a few spelling errors (en.US):

Baeysian                          sharks.Rmd:22
conventience                      sharks.Rmd:160
digitised                         nbaR.Rd:12
                                         description:1
donw                              tomato.Rmd:156
errenous                          sharks.Rmd:247
exaclty                           tomato.Rmd:62
expleore                          sharks.Rmd:147
objetcs                           sharks.Rmd:171
respose                           tomato.Rmd:47
retreive                          geo_age.Rd:19
                                  tomato.Rmd:28
whch                              tomato.Rmd:62
worning                           sharks.Rmd:236

ServiceAccessPoints not parsed in fromJSONString

When reading a multimedia record from JSON, it seems to skip the serviceAccessPoints. Testfile:

{"totalSize":1,"resultSet":[{"score":12.3626375,"item":{"sourceSystem":{"code":"OBS","name":"Observation.org - Nature observations"},"sourceSystemId":"18926816","recordURI":"https://observation.org/photo/18926816/","id":"18926816_img@OBS","sourceInstitutionID":"Observation International","sourceID":"Observation.org","owner":"https://observation.org/user/90734/","licenseType":"Copyright","license":"CC BY-NC","unitID":"OBS.18926816_img","collectionType":"Observations","serviceAccessPoints":[{"accessUri":"https://observation.org/photo/18926816.jpg","format":"image/jpeg","variant":"Best Quality"}],"type":"StillImage","taxonCount":0,"associatedSpecimenReference":"164118412@OBS","phasesOrStages":["unknown"],"sexes":["undetermined"],"gatheringEvents":[{"continent":"Europe","country":"The Netherlands","iso3166Code":"NL","provinceState":"Zuid-Holland","locality":"Leiden - Bioscience Park Leiden","dateTimeBegin":"2018-10-31T00:00:00+0000","dateTimeEnd":"2018-10-31T23:59:59+0000","siteCoordinates":[{"longitudeDecimal":4.45,"latitudeDecimal":52.15,"coordinateErrorDistanceInMeters":5000,"spatialDatum":"WGS84","geoShape":{"type":"Point","coordinates":[4.45,52.15]}}]}],"identifications":[{"taxonRank":"multispecies","scientificName":{"fullScientificName":"Geometridae indet."},"defaultClassification":{"kingdom":"Animalia","family":"Geometridae"},"vernacularNames":[{"name":"Unidentified Geometer moth","language":"en","preferred":true}]}]}}]}

Final notes

  • [Done] In the en tibi vignette, the first paragraph is repeated.
  • [Done] Shark vignette: get_age is not a function. Should be geo_age?
  • [Done] Shark vignette: broken images?
  • [Done] Might be better to save the shark phylogeny as an R object (see save and use xz compression for CRAN ). That way the file can be compressed and loading is easier for a user: data(shark_phylogeney). See: http://r-pkgs.had.co.nz/data.html
  • [Done] It is good practice to spell out logicals: T and F should be TRUE and FALSE. (T and F are not protected)
  • [Done] Keep 80 column width in vignette examples please.

Pretty printing?

I wonder whether it would be better to have some more human readable representation when a client query result is printed? The current output is:


> # specify two query conditions
> l <- list(identifications.typeStatus="holotype", sex="female")
> # run query
> res <- client$query(queryParams=l)
> res
<Response>
  Public:
    clone: function (deep = FALSE)
    content: QueryResult, R6
    initialize: function (content, response)
    response: response

Users might expect something that would tell how many results were returned by the query, and maybe some information about the first couple of results. A public print method that provides some of this information would be nice. Note that this might not be necessary to add to these classes if there are wrapper functions that users are calling instead.

data frames

One potential solution that retains robustness to changes in the API would be to include wrapper functions that ingest the R6 objects that are currently created, and from those objects, create more commonly used objects (e.g., data frames, lists, and/or vectors). In particular, functionality to make data frames from query results might be nice, as data frames will generally be a preferred format for people wanting to analyze data. One expectation for new users might be that each record is a row in a data frame. The nested nature of the data does not immediately admit simply generating a table of records, but list columns might help to make this data structure (one row per record) more manageable for users to understand. Adding some functionality like this could make the package more accessible, and would probably be my biggest suggestion for improvement.

error messages

volgens mij hadden we het er al over gehad, maar de errors zijn generiek ("API server error") en soms is voor de leek bijna niet na te gaan wat er aan de hand is, zoals deze:
q1 <- QueryCondition$new(field = "associatedMultiMediaUris.accessUri", operator="EQUALS", value="some value")

geeft een error omdat de URI's niet geïndexeerd zijn. uiteindelijk vind je dat denk ik wel terug in de documentatie, maar dat vereist wel diep graven.

sc?

At this point in the text ….
Note: The function getFieldInfo on a certain field lists which operators are allowed for that field (e.g. sc$get_field_info()$content$unitID$allowedOperators).
I got confused as to what sc was. (Also is it getFieldInfo or get_field_info?) I think this section could be improved. Perhaps a better example showing how a user might look up all the available query options for the specimen service.

allowed_operators <- sc$get_field_info()$content
names(allowed_operators)

Or, perhaps a wrapper function?

Documentation

I found the default R help files (e.g., ?ChronoStratigraphy) to be not super helpful in trying to understand how to use the package. In particular, the "Usage" section usually contains the name of the class, so that users might need to read through the fields and methods to understand how to use the class. This could be hard for typical R users, particularly if they are not familiar with OOP in R6, and it also means that the help pages do not illustrate how the classes fit together. So, one option would be to hand-edit these autogenerated "Usage" sections, but that doesn't seem ideal if the API is going to change frequently. The current solution of leaning more heavily on the vignettes to demonstrate how all of these classes fit together makes sense from the perspective of automation, and it might even be worth linking users to relevant vignettes from the help files. If (human-authored) wrapper functions are added to the package, then those help files could have more detailed "Usage" sections.

On to the vignettes! As the package is currently written, these vignettes are a key resource for users. I really appreciated that the vignettes seemed complete in the sense of covering major functionality, and both of the case study vignettes were fun to work through and will be useful for users who are trying to solve those types of problems. That said, there may be some better ways to structure the vignettes. The "Get started" vignette currently is long, which might deter users who are looking for something like a "Quickstart" guide. Would it make sense to break the current vignette into several smaller vignettes on the basis of which *Client class they use? For example, vignettes for SpecimenClient, TaxonClient, MultimediaClient, GeoClient, and MetadataClient, with names that reflect common use cases that users will recognize, e.g., "Querying specimen records", "Searching by taxon", "Finding and accessing multimedia", etc.?

done

Goodpractice results

By removing the vignettes from the R package itself, you may also fix these issues goodpractice raised:

  ✖ write unit tests for all functions, and all package code in general.
    93% of code lines are covered by test cases.

    R/Feature.r:86:NA
    R/Feature.r:111:NA
    R/Feature.r:112:NA
    R/GeoArea.r:122:NA
    R/GeoArea.r:173:NA
    ... and 628 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version.
    This typically indicates Rd problems.
  ✖ fix this R CMD check ERROR: Re-running with no redirection of
    stdout/stderr. Hmm ... looks like a package You may want to clean up by 'rm -Rf
    /var/folders/ps/g89999v12490dmp0jnsfmykm0043m3/T//Rtmpn15kYK/Rd2pdf11fbd7d461aa9'
  ✖ fix this R CMD check NOTE: installed size is 6.3Mb sub-directories of
    1Mb or more: R 1.8Mb doc 4.3Mb

Size

I may have missed something here, but for me the size argument didn’t work -- the results were always stuck at 10.

library(nbaR)
sc <- SpecimenClient$new()
qc <- QueryCondition$new(field = 'identifications.defaultClassification.genus',
                         operator = 'EQUALS', value = 'Solanum')
qc2 <- QueryCondition$new(field = 'identifications.defaultClassification.specificEpithet',
                          operator = 'EQUALS', value = 'lycopersicum')
qs <- QuerySpec$new(conditions = list(qc, qc2))
res <- sc$query(querySpec = qs, size = 1000)
res$content$totalSize == length(res$content$resultSet)

Map of tomatoes

There is code, some of which was commented out, placed at the bottom of the En Tibi vignette without any accompanying text. Running this code I was able to generate a map of collected S. lycopersicum specimens. Is there a reason for it to be excluded? I am a little confused with the query for ‘gatheringEvent.siteCoordinates’ -- they are set to ‘NOT_EQUALS’ yet return only specimens which contain values in these slots.

Advanced queries

It would be great if users could use more idiomatic R code for multiple queries, e.g., something like r 'sex == female & identifications.taxonRank == species' rather than needing to create multiple QueryCondition objects and combine them in a QuerySpec object. I'm not sure how easy this would be to implement, but it could increase usability.

Element.R

A file Element.R gets generated by swagger codegen but should not.

QuerySpec size parameter

waarschijnlijk heb je het gewoon nog niet geïmplementeerd, maar in het vignette staat als voorbeeld al ```
qs <- QuerySpec$new(conditions=list(qc), size=1000), maar size doet het niet (het zijn altijd 10 resultaten):
q1 <- QueryCondition$new(field = "sex", operator="EQUALS", value="female")
qs <- QuerySpec$new(conditions=list(q1))
res <- sc$query(querySpec=qs, size=1000)
res$content$resultSet[[11]]$item
Error in res$content$resultSet[[11]] : subscript out of bounds

Check results

Warning: @nord [/Users/djb208/Coding/nbaR/R/Utils.r#27]: has no parameters
Warning: @nord [/Users/djb208/Coding/nbaR/R/Utils.r#205]: has no parameters

  • checking installed package size ... NOTE
    installed size is 5.9Mb
    sub-directories of 1Mb or more:
    R 1.4Mb
    doc 4.3Mb

  • checking top-level files ... NOTE
    Non-standard file/directory found at top level:
    ‘tic.R’
    If the authors wish to submit to CRAN they will likely need to ensure that the package size is less than 5MB. The excessive size of the package is likely due to the images in the vignettes. They could prevent this by halting the building of vignettes in the R package itself or removing the images. I prefer the former option as I feel they could, in fact do with more use-cases (as explained above and below), and they can always refer users to the online documentation in the locally available R help files. If they choose the former, they should refer to this guide: https://pkgdown.r-lib.org/articles/pkgdown.html#articles

Help files

In one of the vignettes, it was suggested I “Have a look at ?Response to see what this object contains.” -- yet the help file had little to no information. I think the level of information found in this help file is not too different from that of others.

I imagine that much of the documentation will have been generated with the code generator but i think it may be necessary to provide more information particularly for the more commonly used functions/classes. In particular, there are very few examples provided for the majority of the help files. I think it is required that all public functions/classes come with examples for any ROpenSci package.

With roxygen, examples can be stored in an examples/ folder and called in using roxygen syntax.

Differentiating among clients

https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L474

It would be great to have some text here to explain how the taxonomic data services and TaxonClient use cases differ from the SpecimenClient use cases described earlier in the vignette. (This would be helpful regardless of whether the "Get started" vignette stays as one big document, or is split up.) Same comment for the geographic data services: https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L616

associated multimedia objects

en een ding dat ik niet helemaal begrijp: in specimen-resultaten met multimedia krijg ik naast associatedMultiMediaUris ook associatedMultiMediaObjects terug (zonder iets er in). wat zijn dat?

More information on operators

It would be great if the different operators could be demonstrated. For example, I wanted to download the taxonomic records for a given list of genera. Initially I was trying to compile multiple query conditions using ‘OR’. But I since discovered that it could be easily done using the ‘IN’ operator with value as a vector of genera.

I noticed there are lots of operators available… would it be too much to ask for more examples that involve other operators other than just EQUALS?

Query/filter specimen that have value in certain field

One aspect I wasn't able to do was call a specimen given that it had data within a certain field. E.g If i want to call specimens of a particular species given that "$item$gatheringEvent$siteCoordinates" has a value.
How would I go about doing that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.