urbanccd-uchicago / plenario-client-py Goto Github PK
View Code? Open in Web Editor NEWOfficial Python Client for the Plenario Datasets API
License: Other
Official Python Client for the Plenario Datasets API
License: Other
I want to be able to quickly grep logs for client usage on the app. Add a User-Agent
header to the requests. Something like plenario-client-py
... I dunno. If you've got a better name then use that.
Story #4
This is the initial stage of building the client. For this, we need:
Responses are paginated by the server, and the server provides links in the meta
object of the response on how to paginate.
Users need to have a clean way to page through the data. meta
and data
need be dropped down a level into a page
object. The parent Response
needs to implement and iterator that returns pages, and on each subsequent page sends a request for the next page. The iterator stops when the meta.links.next
value is null.
Story #3
Following up on #16, implement this library. Description
s include bounding boxes that need to be cast to objects, and user created geometries need to be cast to GeoJSON strings for filtering.
Description.bbox
to a Polygon
F
) are properly serialized as strings in the parameters.Story #4
All responses will contain a meta
block. This class will handle the information contained in that object.
This is a critical piece of the response and how the users will interact with the API, especially pagination and understanding query structure.
The meta
object of the response will look like:
{
"links": {
"current": "a self-referential link",
"previous": "a link to the previous page of records",
"next": "a link to the next page of records"
},
"params": {
"page_size": "the maximum limit of records in a page",
"page": "the current page number",
"order_by": [
"direction",
"field"
]
},
"counts": {
"total_pages": "the total number of pages of records",
"total_records": "the total number of records in all the pages",
"data": "the number of records in the data portion of the response",
"errors": "the number of errors in the errors portion of the response"
}
}
links
, params
and counts
must all be available to the user. Within links
and counts
, all of the example field keys must be present. params
will be dynamically built using the response -- it's based on the query params sent in the request.
Story #2
The vast majority of responses will contain data. What lands in this object is widely subjective and basically impossible to account for ahead of time, given that the data sets are only known when they are accessed and update continuously.
In order to handle the wide variety of response data, we need to think through what is the best way that we can wrap all of this up -- we need to consider the use cases for this data.
Use cases: to me there are two that come to mind -- general, manual exploration (typically via a Jupyter notebook), and automated, large scale data processing (either via Jupyter or some other system).
For the first case, it would make sense to return the data in some sort of named fashion: either a dictionary or namedtuple
would work nicely here. I would think the tuples would be best, but we need to take into account that they only work with alphanumeric values for keys -- if a field is named My Field
it would need to be snaked into my_field
.
For the second case, I would think that more serious analysis would be done using one of the data/scientific libraries (pandas, numpy, etc). In these cases, we should load the data directly into the appropriate container (DataFrame, NDArray, whatever).
Of course, this would involve getting fancy in both how the user wants to parse the response and how we package the client (but that's something I know how to do, and it's not that bad).
Story #2
We need to marshal values from a JSON object to a native object. TimeRange
s and their use would be a great addition to providing users with a clean and intuitive interface.
Time ranges consist of lower and upper bounds and lower and upper inclusive booleans. See https://github.com/UrbanCCD-UChicago/plenario2/blob/master/lib/postgres_extensions.ex#L7 for some detailed information about their use in the API.
Specific to this implementation, we need to be able to check for multiple conditions and functionality as implemented by:
__lt__
__le__
__ge__
__gt__
__eq__
__contains__
__str__
Also note that due to the permissive nature of ranges, we need to normalize some values. Here are the assumptions that you can make:
seconds
is the least significant value, so always round to that.Regarding lower/upper exclusivity, here's an example:
{
'lower': '2018-01-01T00:00:00',
'upper': '2019-01-01T00:00:00',
'lower_inclusive': true,
'upper_inclusive': false
}
is equal to
{
'lower': '2018-01-01T00:00:00',
'upper': '2018-12-31T23:59:59',
'lower_inclusive': true,
'upper_inclusive': true
}
Of course, given the stated assumptions above, we would want to normalize the second object to be the first when parsing the JSON payload.
Story #4
Responses will always contain meta
parts; they could have either data
or errors
. We nee to be able to parse those different parts of the body.
Epic #1
In several places in the Description
class we should be parsing the timestamp strings to proper datetime
objects.
Story #4
This is a research task. Come up with a short list of libs that will correctly handle GeoJSON and cast values into proper Geometries.
Story #4
For requests that error out, we need to provide users with some sort of accessible means to the error information.
In the response body, errors will be a list of strings. Most typically, it will be a single string as the server should halt of the first error it encounters.
I'm open to this either being an object in its own right and deferring error alerting to the response class (built later), or making this a custom Exception
that is raised. I'm leaning more to it being an exception, but I can see some, albeit marginally, useful use cases for wrapping it as an object and deferring it being raised later.
Story #2
Start with the basics on this one.
The response comes back as JSON. It will always have meta
information that can be loaded into a Meta
object. It will contain either errors
which, depending on the path we took with those, will need to either be loaded into an object or raised as an exception, or it will contain data
. Data then needs to be marshalled into whatever we built to wrap that.
Story #3
There are 4 different request types that can be sent to the API:
Each of these has its own unique set of parameters and endpoint identifiers. They also have common patterns, like pagination and resource base routes.
We need to build out some sort of workflow for accessing data sets via these request types.
Epic #1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.