sckott / bienapi Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 1.0 276 KB

BIEN REST API

Home Page: https://bienapi.xyz/

License: MIT License

Ruby 97.93% HTML 1.45% Dockerfile 0.62%

rest-api api species traits

bienapi's Introduction

BIEN API

See the API Docs to get started: https://docs.bienapi.xyz/

bienapi's People

Contributors

Stargazers

Watchers

Forkers

andres-lorenzatti-olx

bienapi's Issues

/plot/protocols problem

Route /plot/protocols. Results return plot_metadata_id as all NAs:

res <- cli$get("plot/protocols")
jsonlite::fromJSON(res$parse("UTF-8"))$data

Omit "plot_metadata_id", assuming this route is shortcut to the following query:

SELECT DISTINCT sampling_protocol
FROM plot_metadata;

I think it'd be better to have separate API keys for each person - one key for everything seems to me the same as no keys at all - so might as well have a sep. key per person, then would give better sense of usage per person, and can throttle people using it "too" heavily

/taxonomy/species/ route with weird error

haven't been able to track down yet.

traceback has /home/cc/bienapi/api.rb:331:in block in class:API`

remove iucn/usda routes

routes to do

occurrence routes

/occurrence/spatial/ Extract occurrence data for specified polygons (WKT) or bounding box (~ BIEN::BIEN_occurrence_spatialpolygons)
/occurrence/state/ Extract occurrence data for a state (~ BIEN::BIEN_occurrence_state)
/occurrence/county/ Extract occurrence data for a county (~ BIEN::BIEN_occurrence_county)
/occurrence/country/ Extract occurrence data for a country (~ BIEN::BIEN_occurrence_country)
/occurrence/count/ Count the number of (geoValid) occurrence records for each species in BIEN (~ BIEN::BIEN_occurrence_records_per_species)

species list routes

/list/county/ Extract species list by county
/list/state/ Extract a species list by state/province
/list/spatial/ Extract a list of species within a given WKT

plot routes

/plot/country/ Get plot data from specified countries
/plot/dataset/ Get plot data by dataset name
/plot/datasources/ List available data sources
/plot/datasources/<protocol name> Get plot data by data source name
/plot/protocols/<protocol name> Get plot data by protocol name
/plot/name/<plot name> Get plot data by plot name (~ BIEN::BIEN_plot_name)
/plot/state/ Get plot data from specified states/provinces

ranges routes

/ranges/species/intersect/ Get range maps that intersect the range of a species (~ BIEN::BIEN_ranges_intersect_species)

trait routes

/traits/family/<trait> Extract specific trait data for given families
/traits/genus/ Extract all trait data for given genera
/traits/genus/<trait> Extract specific trait data for given genera
/traits/species/ Extract all trait data for given species
/traits/species/<trait> Extract specific trait data for given species
/traits/trait/ Extract all measurements for a trait
/traits/count/ Count the number of trait observations for each species in the BIEN database

count in API is off for some routes

often returning 1 when there are def. more than 1 results

Unsupported media type in serve_data method - make api fail early in those cases

probably put in the before block

rate limiting

caddy has a rate limit plugin, but not sure that its working, probably not a big deal in the early days of public usagre

fix unexpected 403s

https://stackoverflow.com/questions/10509774/sinatra-and-rack-protection-setting/16125324#16125324

potential offset param problem on /list/country route

res <- cli$get("list/country", query = list(country = "Canada", limit=1000))

everything is fine, but I get an error when I try adjusting the offset:

res <- cli$get("list/country", query = list(country = "Canada", offset=10))

The error is a 400, and the message is:

 "PG::InvalidColumnReference: ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list\nLINE 4:             AND is_new_world = 1) ORDER BY scrubbed_species_...\n                                                   ^\n: SELECT COUNT(DISTINCT count_column) FROM (SELECT DISTINCT 1 AS count_column FROM \"species_by_political_division\" WHERE (country in ('Canada')\n            AND scrubbed_species_binomial IS NOT NULL\n            AND (is_cultivated = 0 OR is_cultivated IS NULL)\n            AND is_new_world = 1) ORDER BY scrubbed_species_binomial OFFSET $1) subquery_for_count"

versioning: via header

Add HSTS header

https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet

e.g. Strict-Transport-Security: max-age=86400; includeSubDomains

to help prevent SSL strip attack

make heartbeat reflect real routes

contents of /heartbeat is manually curated right now, bad idea, won't reflect actual routes if I forget to update it - possibly try https://stackoverflow.com/questions/13694058/how-to-get-a-list-of-all-routes-used-in-a-sinatra-app#13788616

DRY out route def's in api.rb

Could put all models in models/models.rb under module Models - i think

auth: implement api key requirement

view_full_occurrence_individual relation not found

not sure what happened - possibly didn't load correctly from dump - though i see it when executing pg_restore -l <file>

/plot route problem

Route /plot. If use fields parameter without including plot_metadata_id,
the latter is returned as all NA. If wish to make plot_metadata_id non-optional,
ensure that it is always populated. E.g., compare

res <- cli$get("plot/metadata?fields= plot_name,country")
jsonlite::fromJSON(res$parse("UTF-8"))$data
res <- cli$get("plot/metadata?fields= plot_metadata_id,plot_name,country")
jsonlite::fromJSON(res$parse("UTF-8"))$data

blarg, ruby problem with slice() method

routes with very long running postgres requests

Some routes

/occurrence/species ~ BIEN::BIEN_occurrence_species
/occurrence/genus ~ BIEN::BIEN_occurrence_genus
/occurrence/family ~ BIEN::BIEN_occurrence_family
/occurrence/spatial ~ BIEN::BIEN_occurrence_spatialpolygons
/occurrence/count ~ BIEN::BIEN_occurrence_records_per_species

sometimes take a very long time to run - and this isn't just a unicorn server or caddy server thing - have checked that the request is taking a long time on the postgres side of things - looks like there are indices on the table view_full_occurrence_individual so that can't be it i assume

thoughts @ojalaquellueva ?

I can send some eg postgres requests behind the API requests and you can try on your server and see if they're also taking a long time. if they just take a long time and there's no way to speed up, may need to serve these long running requests in a separate sort of async service so as not to bog down the main API

e.g., query that takes a long time:

SELECT scrubbed_species_binomial, latitude, longitude,date_collected,datasource,dataset,dataowner,custodial_institution_codes,collection_code,a.datasource_id     
	FROM (
		SELECT * FROM view_full_occurrence_individual 
		WHERE higher_plant_group IS NOT NULL AND is_geovalid =1 
			AND latitude BETWEEN  27.31 AND 37.29 
			AND longitude BETWEEN  -117.13  AND  -108.62 
	) a
	WHERE st_intersects(ST_GeographyFromText('SRID=4326; POLYGON((-114.125 34.230,-112.346 34.230,-112.346 32.450,-114.125 32.450,-114.125 34.230)) '),a.geom) 
	AND (is_cultivated = 0 OR is_cultivated IS NULL) 
	AND is_new_world = 1  
	AND ( native_status IS NULL OR native_status NOT IN ( 'I', 'Ie' ) ) 
	AND higher_plant_group IS NOT NULL 
	AND (is_geovalid = 1 OR is_geovalid IS NULL) 
	ORDER BY scrubbed_species_binomial;

on my server: This query takes at least 5+ minutes, didn't wait for it to finish
on vegbiendev.nceas.ucsb.edu: takes ~ 1 min 20 sec

Even at the shorter time of the vegbiendev.nceas.ucsb.edu server, that's too long for a normal REST API route - could these longer queries be sped up? Additional indices perhaps? Not sure why the difference in my server and vegbiendev.nceas.ucsb.edu - must be different setups.

add csv option

via content negotiation Content-Type: text/csv

according to https://tools.ietf.org/html/rfc4180

/stems problem

/stem returns NA for all analytical_stem_id. This should not be possible if returning individual rows from table analytical_stem. E.g.,

res <- cli$get("stem/species", query = list(species = "Lysimachia quadrifolia", fields="datasource_id, scrubbed_species_binomial, analytical_stem_id, cover_percent"))
jsonlite::fromJSON(res$parse("UTF-8"))$data