Giter VIP home page Giter VIP logo

pyinaturalist's Introduction

pyinaturalist

Build Codecov Documentation

PyPI Conda PyPI - Python Versions

Run with Binder Open in VSCode


Introduction

iNaturalist is a community science platform that helps people get involved in the natural world by observing and identifying the living things around them. Collectively, the community produces a rich source of global biodiversity data that can be valuable to anyone from hobbyists to scientists.

pyinaturalist is a client for the iNaturalist API that makes these data easily accessible in the python programming language.

Features

  • ➡️ Easier requests: Simplified request formats, easy pagination, and complete request parameter type annotations for better IDE integration
  • ⬅️ Convenient responses: Type conversions to the things you would expect in python, and an optional object-oriented interface for response data
  • 🔒 Security: Keyring integration for secure credential storage
  • 📗 Docs: Example requests, responses, scripts, and Jupyter notebooks to help get you started
  • 💚 Responsible use: Follows the API Recommended Practices by default, so you can be nice to the iNaturalist servers and not worry about rate-limiting errors
  • 🧪 Testing: A dry-run testing mode to preview your requests before potentially modifying data

Supported Endpoints

Many of the most relevant API endpoints are supported, including:

  • 📝 Annotations and observation fields
  • 🆔 Identifications
  • 💬 Messages
  • 👀 Observations (multiple formats)
  • 📷 Observation photos + sounds
  • 📊 Observation histograms, observers, identifiers, life lists, and species counts
  • 📍 Places
  • 👥 Projects
  • 🐦 Species
  • 👤 Users

Quickstart

Here are usage examples for some of the most commonly used features.

First, install with pip:

pip install pyinaturalist

Then, import the main API functions:

from pyinaturalist import *

Search observations

Let's start by searching for all your own observations. There are numerous fields you can search on, but we'll just use user_id for now:

>>> observations = get_observations(user_id='my_username')

The full response will be in JSON format, but we can use pyinaturalist.pprint() to print out a summary:

>>> for obs in observations['results']:
>>>    pprint(obs)
ID         Taxon                               Observed on   User     Location
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
117585709  Genus: Hyoscyamus (henbanes)        May 18, 2022  niconoe  Calvi, France
117464920  Genus: Omophlus                     May 17, 2022  niconoe  Galéria, France
117464393  Genus: Briza (Rattlesnake Grasses)  May 17, 2022  niconoe  Galéria, France
...

You can also get observation counts by species. On iNaturalist.org, this information can be found on the 'Species' tab of search results. For example, to get species counts of all your own research-grade observations:

>>> counts = get_observation_species_counts(user_id='my_username', quality_grade='research')
>>> pprint(counts)
 ID     Rank      Scientific name               Common name             Count
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
47934   species   🐛 Libellula luctuosa         Widow Skimmer           7
48627   species   🌻 Echinacea purpurea         Purple Coneflower       6
504060  species   🍄 Pleurotus citrinopileatus  Golden Oyster Mushroom  6
...

Another useful format is the observation histogram, which shows the number of observations over a given interval. The default is month_of_year:

>>> histogram = get_observation_histogram(user_id='my_username')
>>> print(histogram)
{
    1: 8,  # January
    2: 1,  # February
    3: 19, # March
    ...,   # etc.
}

Create and update observations

To create or modify observations, you will first need to log in. This requires creating an iNaturalist app, which will be used to get an access token.

token = get_access_token(
    username='my_username',
    password='my_password',
    app_id='my_app_id',
    app_secret='my_app_secret',
)

See Authentication for more options including environment variables, keyrings, and password managers.

Now we can create a new observation:

from datetime import datetime

response = create_observation(
    taxon_id=54327,  # Vespa Crabro
    observed_on_string=datetime.now(),
    time_zone='Brussels',
    description='This is a free text comment for the observation',
    tag_list='wasp, Belgium',
    latitude=50.647143,
    longitude=4.360216,
    positional_accuracy=50,  # GPS accuracy in meters
    access_token=token,
    photos=['~/observations/wasp1.jpg', '~/observations/wasp2.jpg'],
)

# Save the new observation ID
new_observation_id = response[0]['id']

We can then update the observation information, photos, or sounds:

update_observation(
    new_observation_id,
    access_token=token,
    description='updated description !',
    photos='~/observations/wasp_nest.jpg',
    sounds='~/observations/wasp_nest.mp3',
)

Search species

Let's say you partially remember either a genus or family name that started with 'vespi'-something. The taxa endpoint can be used to search by name, rank, and several other criteria

>>> response = get_taxa(q='vespi', rank=['genus', 'family'])

As with observations, there is a lot of information in the response, but we'll print just a few basic details:

>>> pprint(response)
[52747] Family: Vespidae (Hornets, Paper Wasps, Potter Wasps, and Allies)
[92786] Genus: Vespicula
[84737] Genus: Vespina
...

Next Steps

For more information, see:

  • User Guide: introduction and general features that apply to most endpoints
  • Endpoint Summary: a complete list of endpoints wrapped by pyinaturalist
  • Examples: data visualizations and other examples of things to do with iNaturalist data
  • Reference: Detailed API documentation
  • Contributing Guide: development details for anyone interested in contributing to pyinaturalist
  • History: details on past and current releases
  • Issues: planned & proposed features

Feedback

If you have any problems, suggestions, or questions about pyinaturalist, you are welcome to create an issue or discussion. Also, PRs are welcome!

Note: pyinaturalist is developed by members of the iNaturalist community, and is not endorsed by iNaturalist.org or the California Academy of Sciences. If you have non-python-specific questions about the iNaturalist API or iNaturalist in general, the iNaturalist Community Forum is the best place to start.

Related Projects

Other python projects related to iNaturalist:

  • naturtag: A desktop application for tagging image files with iNaturalist taxonomy & observation metadata
  • pyinaturalist-convert: Tools to convert observation data to and from a variety of useful formats
  • pyinaturalist-notebook: Jupyter notebook Docker image for pyinaturalist
  • dronefly: A Discord bot with iNaturalist integration, used by the iNaturalist Discord server.

pyinaturalist's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyinaturalist's Issues

Boolean API params

I just noticed while playing with the new taxa endpoints that boolean API parameters need to be expressed as JS strings, for example:

get_taxa(q='Lixus bardanae', is_active='false')

It would be better to be able to do:

get_taxa(q='Lixus bardanae', is_active=False)

To do:

  • investigate if we have a handy (lightweight, applicable to other endpoints, ...) solution for this
  • if yes, fix
  • if no, properly document this behaviour
  • see if the issue is also present with other functions/endpoints

Sample response data: Split into raw JSON responses (for testing) and processed python responses (for docs)

Currently the same JSON files are used for testing and for example responses in the docs. Since there are differences (type conversions, etc.), a separate set of modified responses should be made for documentation purposes.

For the docs, some of the more verbose response fields (like geometry GeoJSON, extra levels of nested taxa/observation data, etc.) could be truncated to make it easier to read.

Enable pre-release builds from dev branch

Publishing pre-release builds from the dev branch would be useful, to make it easier to use the latest pyinaturalist changes in a client application. This also helps to discover any build-related issues early on, before merging to master.

Per PEP 440, an artifact with a version suffix like 0.10.dev5 indicates that it's a pre-release; it can be hosted by pypi, ignored by pip install, and only installed with either pip install --pre or the specific version (pip install pyinaturalist==0.10.dev5).

I've done this in other CI systems, and I'm not sure yet how to do this with Travis CI, but it would be worth exploring.

Add support for both file paths and file-like objects to endpoints that upload files

The following functions have file parameters:

  • rest_api.create_observations()
  • rest_api.add_photo_to_observation()

I'd like to be able to pass either file paths, files, or file-like objects to these, e.g.:

# File path
rest_api.add_photo_to_observation(1234, '~/Photos/obs_1234.jpg', access_token)

# File object (current behavior)
with open('~/Photos/obs_1234.jpg', 'rb') as f:
    rest_api.add_photo_to_observation(1234, f, access_token)

# File-like object
base64_photo_bytes = BytesIO(b'...')
rest_api.add_photo_to_observation(1234, base64_photo_bytes, access_token)

This should be straightforward to add, preferably with a reusable utility function.

Add full API request params w/ type annotations and docstrings for all endpoints

Sume functions already have this, but I think it will be useful to make all the function signatures consistent, and add annotated request params for the endpoints that don't have them yet. This is mainly to make them easier to use within an IDE without referring to the API docs, and to provide better runtime errors if an invalid parameter is passed.

This will just replace the keyword args **params for some functions. To avoid breaking backwards-compatibility, the remaining functions with the positional arg params (like get_observations()) will accept both, show a DeprecationWarning if called with params=<dict> (but otherwise still work as it did previously), and later remove params in some future release. For example:

def get_observations(
	params: Dict = None, 
	acc: bool = None,
	captive: bool = None,
	endemic: bool = None,
	geo: bool = None,
	...,
	user_agent: str = None,
) -> Dict[str, Any]:
    if params:
        warnings.warn(DeprecationWarning("The 'params' argument is deprecated; please use keyword arguments instead"))

Endpoints to update

  • node_api.get_observations() (Note: This has 77 parameters! Yikes!)
  • node_api.get_observation_species_counts()
  • node_api.get_all_observations()
  • node_api.get_taxa()
  • node_api.get_taxa_autocomplete()
  • rest_api.get_observations()
  • rest_api.create_observations()
  • rest_api.update_observation()

Parse timestamps into python datetime objects

I just realized that observation timestamps (observed_on_string) aren't in a consistent format. This field may come directly from user-submitted photo metadata. The main issue here is timezone offset.

Some timestamps declare this explicitly, for example: 'Sat Sep 26 2020 12:09:51 GMT-0700 (PDT)'
While others only have the timezone code: '2020-09-27 9:20:02 AM PDT'
python-dateutil is able to handle most other variations, but can't automatically convert a timezone code to a timezone offset.

Because of this, I think it would be useful to handle this in pyinaturalist observation endpoints and any other endpoints that return timestamps. Otherwise, any client code that wants to make use of observation time information would have to handle this. There is a separate time_zone_offset field that contains an offset string like '-08:00', which could be parsed and added to a timezone-unaware datetime object.

  • Parse timestamps
  • Parse and apply timezone offsets
  • Convert observation created_at
  • Convert observation observed_on
  • Convert observation field created_at
  • Convert observation field observed_on
  • Convert project created_at
  • Convert project updated_at
  • Convert taxon created_at
  • Convert taxon updated_at
  • Unit tests

Add a test mode

Especially useful to test code that does some "write" operations.

The idea is that in test mode, the API calls are not performed but logged somewhere (to be defined).

How to get projects information?

Hello,

Thank you for the work done, is really useful. I would like to know how can I get all the projects information, Is it possible?

Thank you in advance,
Miriam

Add an example of creating a choropleth map

A choropleth map of observations would be a great example to add. This can be just for a single country for now, preferably at the county (or equivalent) level.

Example: https://altair-viz.github.io/gallery/choropleth.html
map

Ideally this would use standardized place IDs (like FIPS codes in the United States) instead of iNat place IDs, to make it easier to combine this with other geospatial datasets. Plan B would be to just use iNat place IDs and polygons.

Add GET /observations/observers endpoint from v1 API

Add the /observations/observers endpoint from Node API.

My selfish motivation behind this: I'd like to be able to ask questions related to user engagement in a project, for instance "how many users have made 10+ observations in project X?," and this endpoint seems to be the place to do it.

It seems pretty straightforward and could be coupled with adding the /observations/identifiers endpoint as well.

  • /observations/observers
  • /observations/identifiers
  • Doctrings + type annotation
  • Usage examples
  • Sample response data
  • Unit tests

If you're ok with this, I'll start working on it over the next few weeks.

Add computer vision features

After mail exchanges with an user of the library, it appear it would be nice to add access to the computer vision (species suggestion based on a picture) to pyinaturalist.

After having a look at the source code and what the webapp does, we discovered two undocumented endpoints in the Node API:

https://github.com/inaturalist/iNaturalistAPI/blob/5b3b6d4a588f5bf68259467fc37809f6aa2371ac/lib/inaturalist_api.js#L268-L276

  • score_observation is used by the webapp when asking suggestions for an already existing observation.
  • I guess score_image is even more interesting and allows to score an uploaded images, without needing an iNaturalist observation. No clue of how to use it (image upload), so we have to experiment a bit.

Is the fact that's it is currently undocumented an issue? I can imagine iNat team does not want its servers flooded by people who have no interest in the project... (but pyinaturalist is probably obscure enough to avoid this). Also, maybe this API is considered semi-private and therefore change frequently?

Transfer project

Hey @JWCook!

Since you made great contributions to this project and that it's unfortunately difficult for me at the moment to properly follow: would you be interested in becoming its official maintainer?

set requests cert behind firewall

This might be simple, and if so, sorry for adding to the clutter.

I need to use a custom certificate in my calls. For example, here's the result using your very first demo:


from pyinaturalist.node_api import get_all_observations

obs = get_all_observations(params={'user_id': 'tghoward'})
Traceback (most recent call last):

  File "<ipython-input-18-33257754f514>", line 1, in <module>
    obs = get_all_observations(params={'user_id': 'tghoward'})

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\pyinaturalist\node_api.py", line 80, in get_all_observations
    page_obs = get_observations(params=iteration_params, user_agent=user_agent)

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\pyinaturalist\node_api.py", line 53, in get_observations
    r = make_inaturalist_api_get_call('observations', params=params, user_agent=user_agent)

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\pyinaturalist\node_api.py", line 26, in make_inaturalist_api_get_call
    response = requests.get(urljoin(INAT_NODE_API_BASE_URL, endpoint), params, headers=headers, **kwargs)

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)

  File "C:\Users\tghoward\AppData\Local\ESRI\conda\envs\arcpro_tim\lib\site-packages\requests\adapters.py", line 514, in send
    raise SSLError(e, request=request)

SSLError: HTTPSConnectionPool(host='api.inaturalist.org', port=443): Max retries exceeded with url: /v1/observations?user_id=tghoward&order_by=id&order=asc&per_page=30&id_above=0 (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))

I notice you are using the requests library and a little sleuthing led me to successful connection using requests directly:

import requests
verify="D:/ca-bundle.crt"
response = requests.get('https://www.inaturalist.org/observations/30723019.json', verify=verify)

the object response has the data I requested.

My question: how can define the correct certificate in my get_all_observations call?

thanks in advance.

Add support for providing credentials via system keyring

Follow-up to issue #68.

There is a convenient cross-platform keyring package that provides integration with the system keyring. This would be useful to have as an optional method of authentication, and would definitely be my preference to use, if it were available.

It's been used on projects by my team at work, and it works fairly well. Keyring backends supported:

  • macOS Key chain
  • Freedesktop Secret Service (on Gnome, Xfce, etc.)
  • KDE4 & KDE5 KWallet (requires dbus)
  • Windows Credential Locker

This could probably be added as an optional dependency. The order of precedence, then, would be:

  1. get_access_token() arguments
  2. Environment variables
  3. Keyring (if keyring is installed)
  4. If none of the above were found, raise an exception

Fix issue with @document_request_params and duplicate keyword arguments

Follow-up from PR #55 .

Example traceback:

  File "/home/cookjo/workspace/pyinaturalist/pyinaturalist/rest_api.py", line 70, in <module>
    def get_observations(user_agent: str = None, **kwargs) -> Union[List, str]:
  File "/home/cookjo/workspace/pyinaturalist/pyinaturalist/forge_utils.py", line 59, in f
    func = copy_signatures(func, template_functions)
  File "/home/cookjo/workspace/pyinaturalist/pyinaturalist/forge_utils.py", line 121, in copy_signatures
    return revision(target_function)
ValueError: Received multiple parameters with name 'user_agent'
]

Add Observation species counts endpoint from Node API

The GET /observations/species_counts
endpoint returns counts of observations of all species matching the given search parameters. This is useful, for example, for getting all the species that a user has observed, ordered by number of observations.

This will also be a good opportunity to add some more request parameter validation. Several observation request parameters are restricted to a list of possible choices; for example, iconic taxa, license, and geoprivacy. If we explicitly define these choices and validate params before sending the API request (with clear error messages for validation errors), it will make it easier to debug bad requests in client applications. Otherwise these inputs would return a 400 response with potentially less helpful/predictable error messages.

Main Tasks

  • Endpoints
    • /observations/species_counts
    • Parameter validation for all multiple-choice request params
  • Docs
    • Doctrings + type annotations
    • Usage examples
    • Update release notes
  • Tests
    • Sample response data
    • Unit tests for new endpoint
    • Unit tests for additional utility functions

Customize readthedocs builder to use apidoc and auto-generate Sphinx sources

Sphinx-apidoc is a useful tool to take autodoc a step further and to generate Sphinx source files, so you don't have to remember to make a .rst file for every one of your modules. This is easy to do in a Makefile or other script; the only catch is that readthedocs uses its own builder, and doesn't use the Makefile.

Fortunately, others have wanted to do the same thing, and sphinxcontrib-apidoc exists for that purpose. The same method used by that extension (adding hooks into Sphinx builder events) can be used to add any other custom behavior that would otherwise go into the Makefile. This has the added benefit of making those steps just work cross-platform, without the need for a separate Makefile and make.bat.

Allow setting custom user agent

Hi - I'm an iNat DevOps engineer. I don't have experience with pyinaturalist as a user, but I do see the requesting coming in. We seem to be getting more requests with user agent python-requests/2.20.0. I'm wondering if it's possible for users to set a custom user agent, and if so could that be added as a recommended best practice? Ideally the user agent would indicate they are using pyinaturalist, and it would include their iNaturalist login, or their email, or their project name. That way people acting in good faith don't get mixed up with the scrapers, who we may ban for making excessive number of API calls. Thanks for considering!

Help from an interested party (formal introduction)

Hello! I am Jacob from the U.S. If you look at my profile you'll see that my colleague and I are working on some applications for collectors (mostly students and researchers). We were looking into consuming the iNaturalist API to add send our observations there as well as our other destination. I stumbled upon your project and I thought it might be a good idea to reach out and get an idea of what your plans are, such as, do you plan to add the module to PyPi eventually? We may like to help you out!

Greetings from across the pond,
Jacob

How to manage errors

For example, currently client code can receive something like {'error': 'Cette observation n’existe plus.'}. as a return value from update_observation()...

So the client code should look for an 'error' key in each of the results (and I'm not sure the API is really consistent in that). So it seems it would be more pythonic and nicer to raise an exception (a very specific if possible, or general one like Error 500 otherwise) each time it's not Status=200.

Add 'upsert' function for observation field values

Currently, put_observation_field_values() will fail if the specified observation field value already exists. It would be much more convenient to create if it doesn't exist, and update (delete and recreate) otherwise. I see this was already noted in the TODOs.

Create basic documentation @RTD

  • Make the doc basics and make it build locally
  • Configure/publish on ReadTheDocs
  • Set up GitHub hooks for automatic buils (needs permission to do so)
  • Make an API page (set type annotations and try to use them with autodoc?)
  • Install page: show how to install master from GitHub
  • Add a "release page" with complete release instructions (for future me)

Datetime API params

Datetime format issue

After looking into other API request parameter types for Issue #17, I think dates and datetimes could benefit from some additional conversion or at least sanity checks. I'm not sure how many formats work, but from a couple quick tests it looks like both the Node API and REST API at least accept ISO 8601 format optionally with time zone (2020-05-06T0243:09.333876-05:00).

When an unsupported date format is provided, though, the API will ignore the param and quietly return incorrect results. So in get_observations(), for example, the params created_d1 and created_d2 currently work fine if you provide strings in the correct format, but will provide unexpected results (e.g., 200 response w/ results but no time constraints) if not. For example:

# Unrecognized format; results not filtered by created date
>>>  r = get_observations({'id': 45210532, 'created_d1': '20200508194556'}); len(r['results'])
1
# Recognized format; results correctly filtered by created date
>>> r = get_observations({'id': 45210532, 'created_d1': '2020-05-08T19:45:56'}); len(r['results'])
0

A convenient addition to that would be the ability to accept either:

  1. A datetime object
  2. A datetime string in any format recognized by dateutil

And then convert to ISO 8601 and add local timezone info (±xx:xx) before sending the request. Let me know if that sounds good to you, or if you have other ideas on how to handle that.

Timezone Issue

This is pretty minor and most people probably wouldn't even notice, but searches on datetime parameters are not accurate down to the hour. If timezone info isn't provided, the API assumes UTC time. For example, take this observation created on 2020-05-07T23:57:29 SAST (UTC+2):

# No results, because API assumes UTC
>>> r = get_observations({'id': 45210532, 'created_d1': '2020-05-07T23:00:00'}); len(r['results'])
0
# Returns the expected result when either timezone offset is specified or UTC time is specified with no offset
>>>  r = get_observations({'id': 45210532, 'created_d1': '2020-05-07T23:00:00+02:00'}); len(r['results'])
1
>>>  r = get_observations({'id': 45210532, 'created_d1': '2020-05-07T21:00:00'}); len(r['results'])
1

Fortunately, adding timezone info from local system time is an easy fix.

Improve parameter handling and docs for creating, updating, and deleting observations

  • Test & document unknown/undefined behavior noted in TODOs for create_observations()
  • Test & document unknown/undefined behavior noted in TODOs for update_observations()
  • Test & document unknown/undefined behavior noted in TODOs for delete_observations()
  • Simplify usage of observation_field_values_attributes for create and update; support a flat dict instead of nested list of dicts
  • Implement local_photos for create and update, and support either local file path(s) or object(s)
  • Update add_photo_to_observation() with support for local_photos

Add Controlled Terms endpoints from Node API

I'd like to add the two Controlled Terms from the Node API.

  • /controlled_terms
  • /controlled_terms/for_taxon
  • Doctrings + type annotation
  • Error handling
  • Usage examples
  • Sample response data
  • Unit tests

Note: /controlled_terms/for_taxon appears to return a 422 if the specified taxon does not exist, which is odd; 404 would seem more appropriate. This should be wrapped in a custom exception; see #49

Integrate Jupyter notebooks into Sphinx docs

The nbsphinx extension can be used for this. As a nice but optional addition, it may also be possible to generate a gallery of thumbnails from notebook output images using aphinx-gallery.

Add GET /observations/histogram endpoint from v1 API

API docs: https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_histogram

This is a useful endpoint that returns different response formats depending on the time interval specified (year, month, week, day, hour, month of year, week of year). Month of year and week of year return integer keys (as strings, since it's JSON); the others return timestamps.

Example response for 'month of year' (default):

{
  "total_results": 12,
  "page": 1,
  "per_page": 12,
  "results": {
    "month_of_year": {
      "1": 272,
      "2": 253,
      "3": 992,
      "4": 3925,
      "5": 7983,
      "6": 7079,
      "7": 9150,
      "8": 8895,
      "9": 8374,
      "10": 6060,
      "11": 920,
      "12": 382
    }
  }
}

Example response for 'month':

{
  "total_results": 13,
  "page": 1,
  "per_page": 13,
  "results": {
    "month": {
      "2020-01-01": 271,
      "2020-02-01": 253,
      "2020-03-01": 992,
      "2020-04-01": 3925,
      "2020-05-01": 7983,
      "2020-06-01": 7079,
      "2020-07-01": 9150,
      "2020-08-01": 8895,
      "2020-09-01": 8374,
      "2020-10-01": 6060,
      "2020-11-01": 920,
      "2020-12-01": 382,
      "2021-01-01": 1
    }
  }
}

To make this response more pythonic, I think it would be most useful to return:

  • a dictionary of {int: int} for the 'month/week of year' intervals
  • a dictionary of {datetime: int} for all other intervals.

The pagination metadata is not necessary for this endpoint since the response will always be returned in a single page.

  • Add histogram endpoint
  • Convert responses appropriately for each interval type
  • Docstrings + type annotations
  • Usage examples
  • Sample response data (for unit tests)
  • Sample response data (for docs)
  • Unit tests

Add additional response formats for observations

Built-in formats

I would like to add support for additional response formats for GET /observations (from original REST API): https://www.inaturalist.org/pages/api+reference#get-observations . I am mainly interested in the dwc format for Simple Darwin Core, but will add the rest while I'm at it.

GeoJSON

Another relevant response format that the API doesn't currently provide is GeoJSON FeatureCollections. This would store the coordinates as the feature geometry, and the remaining observation info as feature properties. This is useful for displaying observations in Mapbox / Leaflet. I recently worked on a small PoC that did this, so I plan on cleaning that up a bit and including that as well.

Example:

 {
    "type": "FeatureCollection",
    "features": [{        
            "type": "Feature", 
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -73.1806613975,
                    44.0194705879
                ]
            },
            "properties": {
                "common_name": "Fall Armyworm Moth",
                "observation_id": 2334322,
                "positional_accuracy": 3,
                "quality_grade": "research",
                "taxon_id": 132468,
                "taxon_name": "Spodoptera frugiperda",
                "timestamp": "2015-10-30T20:07:00-05:00",
                "uri": "http://www.inaturalist.org/observations/2334322"
            }
        }
   ]
}

Since the observation response returns a lot of info that may not be needed in a map viewer context, the included properties should be customizable, e.g. by providing an optional list of keys to include. It should also have some sensible default that includes the most basic info; tentatively, the 8 fields shown in the example above would be a good starting point.

Another caveat is that this would obviously only include observations with GPS info and Public geoprivacy. The parameters has[]=geo&observation[geoprivacy]=public should do that. Not sure if Obscured coords should be returned or not.

Add data visualization examples

Summary

In order to show off some of the cool data that iNaturalist provides, combined with the power of python, it would be great to add some examples to the README that make use of data visualization tools.

Background Info

If you are coming here from Hacktoberfest or the good-first-issue topic, welcome! Recommended experience includes:

  • Familiarity with iNaturalist
  • Familiarity with GitHub-flavored Markdown
  • Basic to intermediate experience with the python language
  • A plus, but not required: familiarity with pandas or other tabular data processing tools

Tasks

I would like to have examples with at few different different data visualization tools, each making use of one or more pyinaturalist API functions. You could pick any of the following:

  1. A basic example using pandas + matplotlib and/or seaborn,
  2. A fancier example using bokeh or altair
  3. A geospatial example using geoviews

Submission

  • Ideally, the example should be in the form of a Jupyter Notebook. If you're not familiar with Jupyter, a python code snippet and an image of the output would be fine as well.
  • Submit a pull request with your file(s) under the examples folder.
  • Also see the Contributing guide for guidelines on submitting pull requests to this project

Reference & Examples

Visualization Ideas

These are just some suggestions, so feel free to come up with your own!

Matplotlib:
Pick two species with a predator-prey relationship, and create a scatter plot of their populations in different discrete regions (like all states in the US, or territories & provinces in Canada, or provinces in South Africa, etc.)

Bokeh:

GeoViews:

Add consistent error handling for all API functions

Follow-up from #4 and #5.

Summary

I'd like to propose making the error handling consistent across all API functions by doing the following:

  • Make all requests call raise_for_status() before returning the response (currently some, but not all, endpoints do this)
  • Add additional custom exception classes for some common error types.
    • The main purpose of this will be to provide more useful/immediately obvious error messages in cases where the existing error is ambiguous or unclear.
    • There are plenty of errors returned by the API that are perfectly clear and don't need to be messed with.
  • Make all custom exception classes inherit from requests.HTTPError, in order to make error handling simpler in client code (e.g., so try ... except HTTPError will catch all request errors)
  • Make all validation errors related to request parameters raise a subclass of ValueError. This would include both 400 errors returned by the API, and issues caught before sending the request.

This is roughly how I've done API error handling in other projects, but before working on this I'd like to look over some other python API clients out there to see if there are other approaches to consider, especially for handling client-side validation errors. For example, using a serialization framework like marshmallow is a thorough way to handle validation, but I've only done that in API implementations (server side), not in API clients. That might be overkill for pyinaturalist.

New custom exceptions (WIP)

There are plenty more that could be added, but for a start, a custom exception class could be useful for each of the following cases:

  • get_controlled_terms(): 422 error if a taxon ID doesn't exist
    • Unintuitive; it's more typical for an API to return a 404 for any operation on a non-existent resource
    • Suggestion: TaxonNotFound
  • update_observation(): 500 error from trying to update a nonexistent obs
    • Unintuitive; it's more typical for an API to return a 404 for any operation on a non-existent resource
    • Suggestion: ObservationNotFound (class already exists)
  • update_observation(): 410 error from trying to update the observation of a different user
    • Unintuitive; a 403 (not authorized) would be more typical.
    • Suggestion: NotAuthorizedError (and maybe make this a subclass of AuthenticationError?)

Add support for providing credentials via environment variables

As an alternative to passing user credentials as arguments to rest_api.get_access_token(), it would be useful to also have the option of providing these via shell environment variables, for example:

export INAT_USERNAME=""
export INAT_PASSWORD=""
export INAT_APP_ID=""
export INAT_APP_SECRET=""

This is a pretty common pattern in other applications (for example, the AWS CLI), since environment variables offer a bit more flexibility. For example, creds can be stored in the system keychain or other secure format and then loaded with something like export INAT_PASSWORD=$(secret-tool lookup ...).

There is also the python keyring package that provides convenient integration with the system keyring, but that could be a separate issue (EDIT: created #69 for this).

If both function arguments and environment variables were to be specified, function arguments would take precedence.

Add Places endpoints from Node API

I'd like to add the three Places endpoints from the Node API.

Main Tasks

  • Endpoints
    • /places/{id}
    • /places/autocomplete
    • /places/nearby
  • Docs
    • Doctrings + type annotations
    • Usage examples
    • Update release notes
  • Tests
    • Sample response data
    • Unit tests

Other minor changes

  • Make get_places_by_id() accept multiple IDs and check that all are valid integers
  • Update get_taxa_by_id() to be consistent with get_places_by_id()`
  • Convert "location" coordinate strings to floats
  • Ensure all observation endpoints also return all coordinates as floats to be consistent with places endpoints
    • Note: Only rest_api.get_observations() (with JSON response format) needed to be updated

Add docs summarizing which iNaturalist API endpoints pyinaturalist implements

It would be useful to show in a concise manner which API endpoints pyinaturalist implements, so a new user can more easily get an idea of the features offered. For that purpose, I'd like to add a table listing all iNaturalist API endpoints, indicating which ones have been implemented in pyinaturalist.

Also, now that the list of API functions has grown, as well as the amount of documentation per function, the docs for the node_api and rest_api modules are now quite long. I'd like to add function summary sections at the top of each of these modules. There are a number of Sphinx extensions that could accomplish this.

Add sample response data to Sphinx docs for all endpoints

It would be useful to have full example responses in our docs. I believe we can reuse the response data we already have for unit tests (test/sample_data).

The only problem is that some of these responses are quite large, and would add tons text to the docs that would be cumbersome to scroll through. So, I would only want to do this if the responses can be put inside collapsible sections that are collapsed by default. Neither Sphinx nor the RTD theme has a feature like that out of the box, but it should be doable with some custom CSS and templates (example 1, example 2).

Another caveat is that some of our functions modify the response data somewhat, like type conversions. The current samples contain the unmodified responses, which are needed for testing said conversions & other response modifications. In those cases (probably 25% of the endpoints or less), it would be fine to just include another copy with the modified response.

Finally, for the sake of code cleanliness, it would be nice (but not required) to implement this in the form of a decorator that takes a path to the sample .json file and modifies the function's docstring with the appropriate Sphinx markup.

E.g, instead of this:

def get_foo(**kwargs):
    """
    [the rest of the docstring]

    Example Response:

        .. container:: toggle

            .. container:: header

                **Show/Hide Example Response**

            .. literalinclude:: sample_response.json
                :language: JSON
    """

We could just write this:

@document_response_format("sample_response.json")
def get_foo(**kwargs):
    """
    [the rest of the docstring]
    """

Migrate from Travis CI to GitHub Actions

RIP Travis CI 😢
Travis-ci.org is going read-only next month, and travis-ci.com will no longer have free plans for open-source public repos. Moving to GitHub Actions seems to be the popular choice.

Specific features to reproduce:

  • Test multiple python versions
  • Run additional style checks + build tests (only run for python 3.8, not all python versions)
  • Send test coverage results to Coveralls if all tests pass
  • Deploy to pypi using an API token stored in GitHub secrets
  • Deploy only if all tests for all python versions passed
  • Deploy stable builds on git tags only
  • Deploy pre-release builds versioned in the format <major>.<minor>.<patch>.dev<build_number>

Drop support for Python 3.4

Python 3.4 reached EOL in March 2019, and it's been adding a bit of extra work to keep compatibility with it. I'd like to drop this in the release after the next one (which would be v0.11).

Things to clean up/restrictions that would no longer apply:

  • Remove from tests and classifiers
  • Remove typing backport and use the stdlib version
  • Remove merge_two_dicts() and use {**dict, **dict} and other PEP 448 syntax
  • Use latest pytest (4.7+)
  • Use latest tox (3.15+)
  • Use json.JSONDecodeError
  • Remove other python 3.4-specific exceptions/workarounds
  • Remove requirements.txt; known minimum required package versions are all specified in setup.py now, and remaining version pins are not needed

Add main Search endpoint

There is a unified /search endpoint from the Node (v1) API that I would find useful. This appears to be the text search endpoint used by the main search bar on inaturalist.org, which combines results from observations, taxa, places, and possibly other records.

  • Add search endpoint
  • Doctrings + type annotations
  • Usage examples
  • Sample response data
  • Unit tests

Add taxa endpoints from Node API

New Endpoints

I am currently playing around with the three taxa endpoints from the iNat Node API:

  • /taxa
  • /taxa/{id}
  • /taxa/autocomplete
    I am working on some simple python wrappers for those. I figured I may as well contribute to this project, since someone else may find those useful. I have a WIP branch here.

Contribution Questions

Before submitting a PR, I have a few questions about contribution guidelines:

  • How do you prefer to handle branching? Should PRs be merged into a dev branch, or straight to master?
  • How do you prefer to handle versioning? Should any PR with non-trivial changes bump the minor version, or would you like to handle that yourself after merging?
  • I see that some wrapper functions take a params dict (like node_api.get_observations(), while others take individual keyword args matching the API request parameters (like rest_api.get_observation_fields()). Personally, in python REST API clients in general, I find individual kwargs more useful. Would that be okay with you?
  • Alternatively, another compromise between those two types of function signature would be to accept a generic unpacked **params, document the individual request params, and pass those along to requests. For example:
def get_example_endpoint(user_agent=None, **params):
    """"Example endpoint
        :param int taxon_id: Unique record ID
        :param str rank: Rank of taxa to search
        :param bool is_active: Return only active taxa
    """
    make_inaturalist_api_get_call("example", params, user_agent=user_agent)

# Usage
get_example_endpoint(taxon_id=1234)

That is a little less verbose than individual named keyword arguments, but would lack type hints/autocomplete from within an editor. I don't have a strong preference between the two. What are your thoughts?

Improve performance for auto-paginated functions

Functions that perform pagination like get_all_observations() could be improved. They should use requests.Session to take advantage of its connection pooling, and also request the largest possible page size.

Manage errors in update_observation

Should we raise Exceptions or just le client code handle that? If the latter, it would be great to at least document what to expect from iNat. Quick check; it seems a bit inconsistent:

  • Trying to update a nonexistent obs return error 500, with no more details.
  • Trying to update the obs of a different user gives error 410 with details "cette observation n'existe plus".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.