atlasoflivingaustralia / ala4r Goto Github PK

Access data and resources hosted by the Atlas of Living Australia (ALA)

Home Page: https://atlasoflivingaustralia.github.io/ALA4R/

R 100.00%

ala4r's Issues

Consider changing image_info() to use web service

Currently the image_info() function scrapes its info from a web page. Ideally this should use a web service instead, but it may require implementing one on the server side. At the time of writing the image_info() function there was no such service.

Check old bug with missing taxonomic info

See specieslist.R. Is this still an issue?

NOTE March 2017: the response object might include records with missing taxonomic information. This is an issue with the ALA server-side systems; see AtlasOfLivingAustralia/bie-index#134

Problems with Status Code 417 errors

Hi there,

I think the functionality provided by ALA4R is really useful, and I have been trying to use it in a project. However I keep running into errors. I get Status Code 417 errors for any query using intersect_points where the number of points are greater than about 30 (the help says the limit is 100,000 points). Here is an example:

library(ALA4R)
pts <- cbind(lat = runif(1000, -40, -12), long = runif(1000, 115, 148))
intersect_points(pts, c("el707","el830")) ## two randomly chosen layers

Error in check_status_code(h$value()[["status"]], extra_info = diag_message) : 
ALA4R: HTTP status code 417 received.
Either there was an error with your request, in the ALA4R package, or the ALA servers are down. Please try again later and notify the package maintainers if you still have problems.

intersect_points(pts[1:25, ], c("el707","el830"))

    latitude longitude temperatureWarmestMonthMin distanceToPermanentWaterWeighted
1  -12.50860  145.7189                         NA                               NA
2  -21.41939  133.5143                      23.62                       0.39824614
3  -22.28138  120.4245                      23.87                       0.41593270
4  -26.60327  137.9844                      23.03                       0.62936470
5  -30.88682  131.9950                      16.23                       1.51092680
6  -32.61151  135.4042                      14.48                       1.30923640
7  -19.02391  129.0999                      23.29                       0.56320510
8  -13.38039  146.1419                         NA                               NA
9  -25.53504  144.0971                      22.04                       0.12041594
10 -12.54634  137.3376                         NA                               NA
11 -26.81527  144.0627                      21.62                       0.08246211
12 -33.61260  134.8484                      14.72                       1.11606450
13 -38.80239  134.7262                         NA                               NA
14 -37.19952  125.4121                         NA                               NA
15 -28.86901  139.5483                      21.89                       0.39217340
16 -31.11741  140.1852                      19.26                       1.02200780
17 -26.64699  119.1863                      22.59                       0.58137770
18 -39.35150  119.6296                         NA                               NA
19 -14.27043  128.7560                         NA                               NA
20 -17.32447  141.9878                      23.98                       0.02828427
21 -28.98620  116.3360                      19.56                       0.33615473
22 -17.77501  121.8608                         NA                               NA
23 -12.82748  128.8183                         NA                               NA
24 -17.31647  116.3364                         NA                               NA
25 -13.68585  138.9851                         NA                               NA

intersect_points(pts[1:35, ], c("el707","el830"))

Error in check_status_code(h$value()[["status"]], extra_info = diag_message) : 
ALA4R: HTTP status code 417 received.
Either there was an error with your request, in the ALA4R package, or the ALA servers are down. Please try again later and notify the package maintainers if you still have problems.

This seems to happen with pretty much any layer I chose. I have tried reinstalling ALA4R but I still get the error. verbose = TRUE gives me no further information.

Any ideas what is going on?

search_names not returning exact matches

I've encountered instances where a search for a subspecies returns a hybrid instead of the ALA taxon with an exact name match. E.g.:

search_names('Asplenium obtusatum subsp. northlandicum')
                                searchTerm                                                                               name       rank                                           guid
1 Asplenium obtusatum subsp. northlandicum Asplenium bulbiferum subsp. gracillimum x Asplenium obtusatum subsp. northlandicum subspecies urn:lsid:biodiversity.org.au:apni.taxon:320470

But taxon Asplenium obtusatum subsp. northlandicum does exist in ALA. Changing the search string to "Asplenium obtusatum ssp. northlandicum" returns the expected taxon.

search_names('Asplenium obtusatum ssp. northlandicum')
                              searchTerm                                     name       commonName       rank                                           guid
1 Asplenium obtusatum ssp. northlandicum Asplenium obtusatum subsp. northlandicum Shore Spleenwort subspecies urn:lsid:biodiversity.org.au:apni.taxon:269397

I'm not sure if this is a biocache issue or an ALA4R issue... a GET request using the species search web service returns the expected taxon, as well as the hybrid (see http://bie.ala.org.au/ws/search.json?q=Asplenium%20obtusatum%20subsp.%20northlandicum).

Other similarly-affected taxa include:

Asplenium bulbiferum subsp. gracillimum
Daviesia mimosoides subsp. mimosoides

Fetching data with ALA4R from Atlas instances in Europe fails when sourceTypeId parameter is provided through occurrences() function, succeeds when not

We use ALA4R to retrieve occurrence data from Atlas instances in Europe. The request for fetching the occurrences appear to follow this form:

/occurrences/index/download?
q=Circus%20macrourus&reasonTypeId=10&sourceTypeId=2001&esc=%5C&sep=%09&
file=data

However, these kind of requests fails for all the instances except the one in Australia. When we change the request to not use the sourceTypeId=2001 parameter and value, it succeeds for
all instances of biocache-service..

Can this be a configuration issue where the European instances of the Atlas do not support the sourceTypeId parameter? Or is it some data missing in that service?

To replicate please use this R code:

library(ALA4R)

ala_config(verbose=TRUE)
ala_config(caching="off")
server_config <- getOption("ALA4R_server_config")

#ws1 <- "http://records-ws.als.scot/"
#ws1 <- "http://recherche-ws.gbif.fr/"
#ws1 <- "http://datos.gbif.es/biocache-service/"
ws1 <- "http://biocache.ala.org.au/ws/"

ws2 <- "http://logger.ala.org.au/service/logger/"

server_config$base_url_biocache <- ws1
server_config$base_url_logger <- ws2

options(ALA4R_server_config = server_config)

occurrences(
  taxon = "Apus apus", 
  #record_count_only = TRUE, 
  download_reason_id = 10
)

Cannot locate citation.csv

occurrences is throwing a warning message about not finding citation.csv. It seems that since the revamp of ALA, citation.csv is no longer included in the download.

e.g.:

occ <- occurrences('lsid:http://id.biodiversity.org.au/name/apni/245602', 
+                       fq='geospatial_kosher:true', 
+                       download_reason_id=7)$data
Warning message:
In open.connection(file, "rt") :
  cannot locate file 'citation.csv' in zip file 'C:\Users\John\AppData\Local\Temp\RtmpIdSthN/23cf28750657a4b5079f891547c21398'

Make sure CI account/badges point to ALA organisation

appveyor badge points to raymondben account, not AtlasOfLivingAustralia account, should be changed

Occurrences download broken

Back-end server changes seem to have caused some collateral damage, some parts of occurrences() are not currently working. See AtlasOfLivingAustralia/biocache-service#158

Document API dependencies

Make a list of the web services used in ALA4R. Can do this by running grep over test output.

occurrences not parsing data correctly when use_data_table is TRUE

occurrences() does not parse data correctly when use_data_table=TRUE. Example below.

x=occurrences(taxon="Chlorophyllum molybdites", download_reason_id=10)
x[[1]][1,]
[1] "5b65daad-4fed-45ff-a70a-5ead010573a2\"\t\"PERTH 8243050\"\t\"08159dd5-b9f7-4758-af64-a40852bd869a\"\t\"Chlorophyllum molybdites (G.Mey.) Massee\"\t\"\"\t\"Chlorophyllum molybdites\"\t\"species\"\t\"False Parasol\"\t\"Fungi\"\t\"Basidiomycota\"\t\"Agaricomycetes\"\t\"Agaricales\"\t\"Agaricaceae\"\t\"Chlorophyllum\"\t\"Chlorophyllum molybdites\"\t\"PERTH\"\t\"PERTH\"\t\"Packsaddle Road, Kununurra\"\t\"-15.85111111\"\t\"128.73305556\"\t\"GDA94\"\t\"-15.85111111\"\t\"128.73305556\"\t\"100.0\"\t\"Australia\"\t\"Victoria Bonaparte\"\t\"\"\t\"Western Australia\"\t\"Wyndham-East Kimberley (S)\"\t\"Byrne, R.\"\t\"2010\"\t\"01\"\t\"2010-01-27\"\t\"PreservedSpecimen\"\t\"PreservedSpecimen\"\t\"\"\t\"el889\"\t\"noIssue\"\t\"true\"\t\"false\"\t\"true\"\t\"true\"\t\"false\"\t\"false\"\t\"false\"\t\"false\"\t\"false\"\t\"false\"\t\"false\"\t\"false\"\t\"false\"\t\"true"

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.4 ALA4R_1.066      devtools_1.6.1   jsonlite_0.9.13  httr_0.5        

loaded via a namespace (and not attached):
 [1] assertthat_0.1  chron_2.3-45    digest_0.6.4    grid_3.1.1      lattice_0.20-29 plyr_1.8.1          Rcpp_0.11.3     RCurl_1.95-4.3  reshape2_1.4   
[10] rgdal_0.9-1     sp_1.0-15       stringr_0.6.2   tools_3.1.1

How to allow spaces in fq parameter

There seems to be a recent change such that the following now returns zero records (was working fine several months back):

x <- ALA4R::specieslist(taxon = "Acacia", fq = c("state:New South Wales"))

If I change New South Wales to Queensland all is well:

x <- ALA4R::specieslist(taxon = "Acacia", fq = c("state:Queensland"))

I tried changing the spaces to obvious things such as + or %20 without luck. Any advice would be greatly appreciated! :)

Google analytics

Investigate identifying ALA4R usage through Google Analytics. ALA4R populates User Agent parameter.

occurrences failing due to no citation.csv

It seems that citation.csv is no longer included in the archives downloaded by occurrences, and so the following (in occurrences) triggers an error:

xc = read.table(unz(thisfile, "citation.csv"), header = TRUE, 
        comment.char = "", as.is = TRUE)

E.g.:

occurrences('Acacia kingiana', download_reason_id=10)
## Please use (fixed|coll|regex)(x, ignore_case = TRUE) instead of ignore.case(x)
## Please use (fixed|coll|regex)(x, ignore_case = TRUE) instead of ignore.case(x)
## Error in open.connection(file, "rt") : cannot open the connection
## In addition: Warning messages:
##   1: closing unused connection 7 (C:\Users\John\AppData\Local\Temp\RtmpAtgTb8/54bb20636a2fef42b96ca50024152010:citation.csv) 
## 2: In open.connection(file, "rt") :
##   cannot locate file 'citation.csv' in zip file 'C:\Users\John\AppData\Local\Temp\RtmpAtgTb8/3b7d4c346db6d4687a8f00515ed42c98'

Change Macropus genus in example

Macropus is now Osphranter. Either change the genus name in examples or switch to a less-abundant example species

Case study - text mining Darwin Core terms

Question: Can the constrained and unconstrained values in a subset of Darwin Core terms (http://rs.tdwg.org/dwc/terms) be classified into a limited number of common terms/classes?

Darwin Core terms suggested: sex, lifeStage, establishmentMeans, record type (mix of humanObservation, machineObservation etc), presence/absence (occurrenceStatus), identificationQualifier, speciment type (typeStatus).

Strategy: ??

search_names and hyphenation

Something weird is going on with hyphenation in search_names. Take for example the species Acaena novae-zelandiae:

search_names('Acaena novae zelandiae') returns

  searchTerm               name                     commonName                                       rank      guid                                            
1 "Acaena novae zelandiae" "Acaena novae-zelandiae" "Biddy Biddy, Biddy-widdy, Bidgee-widgee, Buzzy" "species" "urn:lsid:biodiversity.org.au:apni.taxon:376906"

but search_names('Acaena novae-zelandiae') returns an empty matrix.

Case study - area report

Question: What species are in an area defined by a gazetteer polygon?

Strategy: Find PID of polygon and get WKT to cookie-cut occurrence records

Check variable renaming

When ALA4R was written, the variable named returned by different services were inconsistent, and so there is various variable-renaming code that tries to make these names more consistent. See primarily rename_variables() in utilities_internal.R. Is this renaming code still required? (Perhaps the server-issued names are now already consistent). If it is required, does it need updating?

Suspected.outlier field no longer in occurrence download?

I've noticed that a field I've used in the past does not seem to be included in occurrence downloads anymore. I believe the field was called Suspected.outlier but might've been detectedOutlier in ALA4R::occurrences(). Downloads I did using http://biocache.ala.org.au/ws/occurrences/index/download back in July 2016 included this field, and I feel like I've seen it recently, but it's no longer included when I run the same download, or when using occurrences().

Do you guys keep track of these sorts of things? I figured you might need to know as you seem to prettify the naming of the fields.

Thanks!

coerce factor to character in search_names

Just a suggestion - I'm often caught out with taxon names stored as factor, e.g. when I forget to set stringsAsFactors=FALSE when reading in data, or if passing in results of taxize::gbif_parse.

Of course a sensible solution is to remember to set stringsAsFactors=FALSE, and to suggest that gbif_parse returns a character vector for its scientificname element.

However, I wonder if it would be worth coercing factors to character within search_names as well?

Unit tests skipped or failed

Warnings and errors in TestThat output need to be investigated. Problems may be caused by infrastructure changes or data changes. Output attached. Of interest seems to be:

test-lists.R#23
test-occurrences.R#82
test-occurrences.R#83
test-occurrences.R#84
test-occurrences.R#101
test-search-guids.R#21
test_output.txt

specieslist() returns error when fq is only parameter specified

E.g.:

> x <- ALA4R::specieslist(fq = c("kingdom:Plantae", "state:New South Wales"))
Error in ALA4R::specieslist(fq = c("kingdom:Plantae", "state:New South Wales")) : 
  invalid request: need either fq or wkt parameter to be specified

It works fine if I add a wkt parameter.

Typo in assertion names

Typo in Conversion (missing its s):

> grep('Converion', ala_fields('assertions')$name, value=TRUE)
[1] "decimalLatLongConverionFailed"

outlierForLayer field provides data for only one layer

This is related to #27.

As an example, the online search for Acacia cangaiensis produces one record that is flagged as an outlier for three layers, Bio15, Bio17 and Bio26.

https://biocache.ala.org.au/occurrences/d97cd2e1-c871-4be5-bd50-2b963f210902

However, the data downloaded via ALA4R give only one layer, el882, which corresponds to Bio15.

Can more information be packed into this field? Or a new field be provided? A comma separated list should work well enough to state which layers a record is an outlier for.

Code to reproduce is below.

Thanks,
Shawn.

library(ALA4R)

search_term = "Acacia cangaiensis"
wkt_text = "POLYGON((154 -43.74,154 -9,112.9 -9,112.9 -43.74,154 -43.74))"

ala = occurrences(taxon=search_term, wkt=wkt_text, download_reason_id=7)
ala$data = ala$data[!(is.na(ala$data$longitude) | is.na(ala$data$latitude)),]
ala$data[ala$data$id == 'd97cd2e1-c871-4be5-bd50-2b963f210902', 'outlierForLayer']

Extra tests to cover downstream packages

Packages that depend on ALA4R are being developed (e.g. https://github.com/BiologicalRecordsCentre/NBN4R) and so ALA4R may need some additional tests to cover functionality that is not yet well tested but which is important for those packages. See BiologicalRecordsCentre/NBN4R#6.

Example in vignette fails

I'm trying to get this package to pass devtools::check() and there are a few issues cropping up.

I cannot get line 173 in the vignette to pass:

tx=taxinfo_download("family:SPHENISCIDAE",fields=c("guid","genus","nameComplete","rank"))

I get

Error in taxinfo_download("family:SPHENISCIDAE", fields = c("guid", "genus",  : 
  invalid fields requested: genus. See ala_fields("general",as_is=TRUE)

If I remove "genus" from the fields argument I still get an error

 Error in check_status_code(h$value()[["status"]], on_redirect = on_redirect,  : 
  ALA4R: HTTP status code 500 received.
  Either there was an error with the request, or the ALA service may be down (try again later). Notify the package maintainers if you still have problems.

Any ideas?

Potential changes for biocache doi downloads

Biocache download (occurrence) files are now hosted by the DOI service, however the doi for a download is not yet compulsory.

When every download generates a DOI then the source of the file will be hosted in the DOI service

If ALA4R consumes files with sensitive data then it should implement authentication to consume those files from DOI service.

See

AtlasOfLivingAustralia/biocache-service#187
/occurrences/offline/status should use the doi file location.
AtlasOfLivingAustralia/doi-service#32
Restrict Access to files with sensitive Data

problem accessing records using occurrences()

On running the following R code, i receive an error that includes a HTTP "301 Moved" server error. I first received on 12/03/2019. I was using the same code on 11/03/2019 with no issue. I assume some server changes or updates are causing this.

query <- sprintf("taxon_name:"%s"", species)
temp <- occurrences(taxon=query,download_reason_id=4, use_data_table=TRUE)
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
<title>301 Moved
(right here) ------^

do you require more detail?

species_info error - dimensions dropped

For some taxa (maybe largely constrained to those without "proper" guids?), species_info fails to return the record.

e.g. for http://bie.ala.org.au/ws/species/ALA_Caladenia_cardiochila

> species_info(guid='ALA_Caladenia_cardiochila')
Error in subset.default(out$classification, select = tempcols) : 
  argument "subset" is missing, with no default
> species_info('Caladenia cardiochila')
Error in subset.default(out$classification, select = tempcols) : 
  argument "subset" is missing, with no default

Similarly for ALA_Pterostylis_squamata.

Seems that in species_info, out[[k]] = out[[k]][, tempcols] needs to be out[[k]] = out[[k]][, tempcols, drop=FALSE] to accommodate these, since e.g. family is the only field present in $classification.

fq matches on empty string in specieslist()

Just noticed this when using specieslist(), e.g.:

> wktPoly <- "POLYGON((152.38 -30.43,152.5 -30.43,152.5 -30.5,152.38 -30.5,152.38 -30.43))"
> x <- ALA4R::specieslist(wkt = wktPoly, fq = "kingdom:Plantae")
> table(x$kingdom, useNA = "always")

        Plantae    <NA> 
    156     964       0

So having specified fq = "kingdom:Plantae" we have 156 records with empty string for kingdom.

In some ways I can see why it is informative to include these records with missing values, so I'm not sure if this behaviour is by design. But perhaps an option in the style of na.rm could be included?

search_names inconsistency

If an unrecognised taxon name is passed to search_names, it returns

<0 x 0 matrix>

But if a vector of unrecognised names (i.e. none of them are recognised) is passed, it returns

Error in if (!empty(x)) { : missing value where TRUE/FALSE needed

e.g.

search_names('Foo')
# <0 x 0 matrix>

and

search_names(c('Foo', 'Bar', 'Baz'))
#   searchTerm name                                              rank      guid                                            
#1 "Foo"      NA                                                NA        NA                                              
#2 "Bar"      "Acacia sp. Marble Bar (J.G. & M.H.Simmons 3499)" "species" "urn:lsid:biodiversity.org.au:apni.taxon:710866"
#3 "Baz"      NA                                                NA        NA

but

search_names(c('Foo', 'Baz'))
# Error in if (!empty(x)) { : missing value where TRUE/FALSE needed

Filtering `occurrences` by date with `fq`

occurrences does not play nicely with date ranges passed to occurrence_year via fq, such as:

occurrences(taxon='lsid:urn:lsid:biodiversity.org.au:afd.taxon:ba8d0c3b-9753-46cf-87b4-a1b9ec290634',
            fq='occurrence_year[2000-01-01T00:00:00Z TO 2020-01-01T23:59:59Z]',
            record_count_only=TRUE)
## ...
## Error in check_fq(fq, type = "occurrence") : 
## invalid fields in fq: occurrence_year[2000-01-01T00, 2020-01-01T23. See ala_fields("occurrence_indexed")

yet http://biocache.ala.org.au/ws/occurrences/search?q=lsid:urn:lsid:biodiversity.org.au:afd.taxon:ba8d0c3b-9753-46cf-87b4-a1b9ec290634&fq=occurrence_year:[2000-01-01T00:00:00Z%20TO%202020-01-01T23:59:59Z]&pageSize=0 returns the expected, date-filtered result.

URL encoding the spaces and/or colons in the occurrences call doesn't get around this (returns count=0).

Am I using the wrong incantation here, or might check_fq be adjusted to permit this type of date filtering? Skipping over check_fq results in a working url being constructed, and the correct count (35693, today at least) being returned.

wkt polygon validation gives false negatives when spaces follow commas

The check_wkt::is_valid_wkt_polygon function needs to strip whitespace around commas.

The following wkts are identical, and as far as I can see valid, but the first one results in a warning of "WKT string appears to be invalid".

This is caused by the string match used in is_valid_wkt_polygon, which compares "154 -43.74" with " 154 -43.74".

wkt_text = "POLYGON((154 -43.74, 154 -9, 112.9 -9, 112.9 -43.74, 154 -43.74))"

wkt_text = "POLYGON((154 -43.74,154 -9,112.9 -9,112.9 -43.74,154 -43.74))"

I assume it is also affected by spaces before the comma.

The simplest solution would probably be to give str_split a pattern like "\s_,\s_".

Regards,
Shawn.

geodeticDatumOriginal instead of geodeticDatum?

For a record such as this, where supplied geodetic datum is not WGS84, occurrences returns the supplied datum (epsg:4202 for the record above) in the geodeticDatum element.

Would this be better returned as geodeticDatumOriginal, with the processed datum given in geodeticDatum instead (since the coordinates given in latitude and longitude seem to correspond to WGS84)?

CRAN submission

Hi there, Any plans to put this up on CRAN?

Check unwanted_columns()

See unwanted_columns() in utilities_internal.R, which drops some columns from returned data objects. Needs checking to make sure it is still correct (no new columns to add? Old columns no longer present?)

Case study - Latitudonal gradient analysis

Question: How do the four summary measures of 'biodiversity' that are updated monthly vary across latitude in Australia (land/terrestrial)? The summary layers are

Occurrence density
Species richness
Shannon's divesity
Endemism

Strategy: Generate a grid of points and intersect these with the four summary layers, sum across longitudes and graph against latitude.

Input check in sites_by_species()

See this note in the sites_by_species() function code:

TODO need way to better check input species query. If the query is incorrect, the call will fail with message along the lines of: "Error in sites_by_species(taxon = "gen:Eucalyptus", wkt = "POLYGON((144 -43,148 -43,148 -40,144 -40,144 -43))", : Error processing your Sites By Species request. Please try again or if problem persists, contact the Administrator."

fails: ss <- sites_by_species(taxon="rk_genus:Eucalyptus",wkt="POLYGON((144 -43,148 -43,148 -40,144 -40,144 -43))",verbose=TRUE)
fails: ss <- sites_by_species(taxon="scientificNameAuthorship:Maiden",wkt="POLYGON((144 -43,148 -43,148 -40,144 -40,144 -43))",verbose=TRUE)
fails: ss <- sites_by_species(taxon="parentGuid:http://id.biodiversity.org.au/node/apni/6337078",wkt="POLYGON((144 -43,148 -43,148 -40,144 -40,144 -43))",verbose=TRUE)

Sites by species failing while waiting for task to complete

The sites_by_species function in ALA4R expects the task status in PointsToGrid to return a status value of <=1 while it is busy off in the background.
(status values: 0 = in_queue, 1 = running, 2 = cancelled, 3 = error, 4 = finished )

In recent days the status query on the task takes a while to return a http 200 with the status value in it (eg 0,1,2,3,4).

Instead it returns http 500 for a while until it comes back with a "finished" status. But that's not technically correct, because it is actually processing the job. ALA4R should be able to expect to error if it gets a 500.

A 5 second delay before checking the status doesn't help.

@adam-collins @djtfmartin is there a spatial services issue that this can be linked to or should I create one?

Occurrences failures

(Issue migrated from #7 (comment))

Over the last 24 hours the occurrences function is producing errors that I haven't seen previously. Any ideas? e.g.

x=occurrences(taxon="penguins", download_reason_id=10)
Error in read.table(unz(thisfile, filename = "data.csv"), header = TRUE, :
no lines available in input

To what does the geodeticDatum (occurrences) field relate?

Consider x <- occurrences('Diplolaena grandiflora', fq='geospatial_kosher:true', download_reason_id=10).

Is x$data$geodeticDatum the datum associated with x$data$latitude and x$data$longitude, or have the coordinates given in those fields already been transformed to WGS84 (or GDA94? - though they are basically identical AFAIK)?

If they have been transformed, then:

perhaps the name "geodeticDatumOriginal" might be more appropriate; and

What does this mean for the following row:

subset(x$data, id=='5400e754-d39c-4814-82ca-2f29b40fb3e2', 
       c('id', 'geodeticDatum', 'latitude', 'longitude', 
         'geodeticDatumAssumedWgs84', 'unrecognizedGeodeticDatum'))


#                                       id geodeticDatum latitude longitude geodeticDatumAssumedWgs84 unrecognizedGeodeticDatum
#139 5400e754-d39c-4814-82ca-2f29b40fb3e2      113.5681 -25.9764  113.5681                     FALSE                      TRUE

If the datum is unrecognised, and it's not assumed to be WGS84, what datum is assumed? In this case latitudeOriginal and longitudeOriginal match latitude and longitude, so it seems no transformation has taken place. Note also that the original occurrences() query requested geospatial_kosher records only - should records such as 5400e754-d39c-4814-82ca-2f29b40fb3e2 be omitted?

Check for un-exposed API functionality

ALA4R does not expose the full API suite: need to check what else should be added to ALA4R?

perhaps region list, download shape object as WKT (see http://regions.ala.org.au/regions/regionList and http://spatial.ala.org.au/ws/shape/wkt/{pid}. Or other ways of getting regions as WKT to pass to e.g. occurrences() or specieslist()

ALA4R Travis build failing

https://travis-ci.org/AtlasOfLivingAustralia/ALA4R/builds/438477720

This issue isn't impacting CRAN, probably the tests are skipped there. Biocache field changes are the likely culprits.

Failure: occurrence_details result has the expected fields (@test-occurrence-details.R#41)
Failure: occurrence_details result has the expected fields (@test-occurrence-details.R#44)
Failure: sites_by_species works as expected (@test-sites_by_species.R#9)

Develop ALA4R training materials

Prepare presentation/training materials highlighting basic usage and case studies.

sites_by_species can fail on valid WKT

See AtlasOfLivingAustralia/biocache-service#225

Make package skeleton

NBN4R (https://github.com/BiologicalRecordsCentre/NBN4R) is a wrapper package around ALA4R. It changes the server URL addresses and other ALA-specific settings (see onload.R) and creates cosmetic re-naming of ALA functions so that they appear as NBN-named functions.
We should make a skeleton package like this so that other national installations that want their own R package can use it as a starting point.

search_names encoding

Just noticed that encoding seems wrong in some cases for the result of search_names.

E.g. see the matched name of the following:

search_names('Simoselaps fasciolatus')

#                searchTerm                                    name                                            commonName    rank
#  1 Simoselaps fasciolatus Simoselaps fasciolatus (GÃ¼nther, 1872) Narrow-banded Shovel-nosed Snake, Narrow-banded Snake species
#                                                                           guid
#  1 urn:lsid:biodiversity.org.au:afd.taxon:96c7acad-0e7d-436e-acb1-a069afd581db

ALA download requests - add sourceTypeId param

When performing an ALA records download, add an extra request param:

sourceTypeId=2001

which corresponds to ALA4R source, as defined at http://logger.ala.org.au/service/logger/sources

ala_list() test failing

error thrown

test-lists.R:26: failure: ala_list does stuff
Names of `l`  ('id', 'name', 'commonName', 'scientificName', 'lsid', 'dataResourceUid', 'kvpValues') 
don't match    'id', 'name', 'commonName', 'scientificName', 'lsid', 'kvpValues'

problem url
https://lists.ala.org.au/ws/speciesListItems/dr1146?includeKVP=true

problem code in test-lists.R:

l <- ala_list(druid="dr1146")
expect_named(l,c("id","name","commonName","scientificName","lsid","kvpValues"))

Summary:
new field returned in the ws: dataResourceUid
(this test is skipped on cran)

Unexpected species match with `search_names`

Searching for "Melaleuca fluviatilis" returns a match to Melaleuca fluviatilis x M.nervosa, despite Melaleuca fluviatilis existing in ALA.

search_names('Melaleuca fluviatilis')
#              searchTerm                              name commonName    rank                                      guid
#1 Melaleuca fluviatilis Melaleuca fluviatilis x M.nervosa         NA unknown     ALA_Melaleuca_fluviatilis_x_M.nervosa

In case it's relevant, my code suggests that in the past, the LSID for this species was urn:lsid:biodiversity.org.au:apni.taxon:251180.

atlasoflivingaustralia / ala4r Goto Github PK

ala4r's Issues

Recommend Projects

Recommend Topics

Recommend Org