iobis / pyobis Goto Github PK

View Code? Open in Web Editor NEW

14.0 5.0 9.0 37.84 MB

OBIS Python client

Home Page: https://iobis.github.io/pyobis

License: MIT License

Makefile 0.01% Python 0.78% Jupyter Notebook 99.21%

marine-data oceanography python

pyobis's Introduction

pyobis

Python client for the `OBIS API(https://api.obis.org/).

Source on GitHub at iobis/pyobis

What is it?

Pyobis is an interesting python package that helps users fetch data from OBIS API which harvests occurrence records from thousands of datasets and makes them available as a single integrated dataset.

The Ocean Biodiversity Information System (OBIS) is a global open-access data and information clearing-house on marine biodiversity for science, conservation, and sustainable development, maintained by IOOS.

Other OBIS clients:

R: robis, iobis/robis

Main Features

Here are just a few of things pyOBIS can do:

Easy handling of OBIS data, easy fetching without handling the raw API response directly.
Built-in functions for occurrence, taxon, node, checklist and dataset endpoints of OBIS API.
Provides easy export of data to Pandas DataFrame, and helps researchers focus more on analysis rather than data mining.

For examples of how to use this repo, see the jupyter notebooks in the /notebooks/ directory. NOTE: GitHub's jupyter notebook display does not show interactive plots; open the notebooks in a jupyter hub (eg colab, binder, etc) for the full experience.

Installation

Install from PyPI

pip install pyobis

Install from conda-forge

Installing pyobis from the conda-forge channel can be achieved by adding conda-forge to your channels with:

conda install pyobis --channel conda-forge

More information here

Install latest development version from GitHub

pip install git+git://github.com/iobis/pyobis.git#egg=pyobis

Install editable dev version from github for local development. System prerequisites: python3, conda

# fetch code
git clone [email protected]:iobis/pyobis.git
cd pyobis
# install
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt
python -m pip install -e .
# test your installation
python -m pytest
# test and generate a coverage report
python -m pytest -rxs --cov=pyobis tests

Documentation

The official documentation is hosted on GitHub Pages https://iobis.github.io/pyobis.

Library API

pyobis is split up into modules for each of the groups of API methods.

checklist - Checklist. Generate a checklist of species under a taxa, IUCN Red List, or most recently added species.
dataset - Dataset. Get metadata of datasets (including datasetid, owner, institution, number of records, etc) for a queried spatiotemporal region or taxa.
nodes - Nodes. Get records or activities for an OBIS node.
occurrences - Occurrence. Fetch occurrence records, geopoints, lookup for a scientificname, extensions (e.g. DNADerivedData, MeasurementOrFacts, etc.)
taxa - Taxonomic names. Get taxon records with taxonid or scientificname, and scientific name annotations by the WoRMS team.

You can import the entire library, or each module individually as needed.

Usage Guide

For a detailed usage guide with information about inputs, output and module functions please read the Usage Guide

Sample analysis

Some Jupyter Notebook based sample analysis and visualization of data grabbed through pyobis have been made available through /notebooks/ directory. To get full experience of the interactive plots (eg. geoplots, etc) please open notebooks in a Jupyter Hub (eg. through Google Colab, Binder, local installation, etc.)

pyobis's People

Contributors

Stargazers

Watchers

Forkers

emiliom ocefpaf 7yl4r pushkar2112 ayushanand18 krish2208 mathewbiddle dotis eeholmes

pyobis's Issues

Getting "null" mof value through /occurrence resource in the new OBIS API v3

Could you please help me out? While trying out the OBIS API v3 on Swagger, I get the mof value as *null every time I iter through the /occurrence resource. I am also attaching a screenshot for your reference.

Thank you for your time!

move tests alongside code modules?

This is a matter of preference, but I prefer to have the test code in the same directories as the module code. Would it be okay with everyone to move the tests alongside the code instead of in a /test/ directory? Here is a demonstration of the options:

tests separate

.
├── /pyobis/
│   ├── occurrences
│   │   └── occurrences.py
│   ├── obisutils.py
└── tests
    ├── test_occurrence.py
    └── test_obisutils.py

tests alongside

.
├── /pyobis/
    ├── occurrences
    │   ├── occurrences.py
    │   └── occurrences_test.py
    ├── obisutils.py
    └── obisutils_test.py

I prefer (2). What do ya'll think?

Create `usage_guide.ipynb` & `occurrences_ocean_sunfish.ipynb` notebooks

Should the tutorial be something added to the readme or a jupyter notebook?

Update documentation

The docs are out of date https://pyobis.readthedocs.io/en/latest/

We should update them before GSoC ends.

notebook about sampling bias?

A potentially helpful notebook to include here is a notebook which illustrates the concept of sampling bias. This notebook will present the issues and leave solving them to be dealt with elsewhere.

Things that could be shown:

spatial bias:
- example: species "X" is pelagic, but records are mostly near-shore
- include a link to biodiversity index notebooks (work-in-progress here)
taxa distribution bias:
- show biases towards certain taxa vs expected taxa distribution
temporal bias

replace setup.py with setup.cfg|pyproject.toml

Clarify listing of GBIF instead of OBIS in package description ref (ref)
- add link to pyGBIF?
- use a notebook leveraging both pyGBIF & pyOBIS?

Might be worth reviewing the rest of setup.py while you are in there.

obisissues.py is unused?

obisissues is a module which is essentially just a lookup table for error codes coming from the API. I don't think we are using this, and I am also uncertain if this module is up-to-date. We could remove the module, but we may also want to retain it for future use. Some text noting this at the top of the file might be sufficient, or we could keep it in a separate branch. I am not sure what is the best practice for retaining legacy code.

which vertical datum do we use in OBIS?

Overview

Some endpoints in OBIS ask for an additional z coordinate over (x,y) for a location. I hope z means vertical datum here. So, which particular datum do we follow - WGS 1984 or NAD 1983?

One of the endpoints that ask for (x, y, z) is occurrences/tile/{x}/{y}/{z}

A sample query:

https://api.obis.org/v3/occurrence/tile/1/1/1?scientificname=mola%20mola

showing suggestions for possible results for a search query using undocumented API request

Overview

Many functions inside pyobis package demand the user to provide exact matches of scientificname, taxonid, or datasetid. While some researchers who are particularly interested in an individual record but many of the researchers who are starting up with their project may not know the exact id or scientific name to query for.

Notes

OBIS Mapper utilizes some endpoints to do the exact thing - suggest id or scientific name based on a query term.

Here is a query for searching species with a keyword. Query looks like: https://api.obis.org/v3/taxon/complete/verbose/{search_term}
Here is a query for searching datasets by name. Query looks like: https://api.obis.org/v3/dataset/complete/{search_term}

Questions

Should we include a function in occurrences module for searching a species via keyword like say a search_keyword(term="query-term-you'd-like-to-search-for")?

and a seperate function search_keyword(term="query-term-you'd-like-to-search-for") in dataset module for searching datasets by name?

Activate reviewnb for this repo

@7yl4r I don't have enough rights but maybe you do. Can you activate https://app.reviewnb.com/ for this repo? It will make reviewing PR with notebooks a bit easier.

Add `.get_mapper_url()` function?

Sometimes it's nice to make sure you're getting the response you would expect, directly from the OBIS mapper - https://mapper.obis.org/

I would like to see an additional response where we can simply return the url for the mapper based on our search configuration. You can use the same url configuration we use for the occurrences.search function but with the following structure

'https://mapper.obis.org/?',occurrence_search_config.replace("occurrence?","")

master -> main

while we're doing all this work, mind renaming master to main?

request for admin/maintainer permissions

I would like to organize issues, review & accept PRs, create milestones on this project. Please grant my account permissions to do so. Thanks!

complete current the 2-part pull-request & get tests working

increment version to 1.0.0
add a note to the changelog https://github.com/iobis/pyobis/blob/master/Changelog.rst

incomplete fetch of checklist from `/checklist` endpoint

Overview

When we try to generate an OBIS checklist using pyobis.checklist -> list() function, then it returns a checklist of size at max 10. This behavior is due to the fact that OBIS API returns by default returns a list of 10 records only in a query.
To fetch subsequent records, we need to pass a skip parameter to skip the number of records already fetched.

For example, let us look at this request

https://api.obis.org/v3/checklist?size=10&skip=10&taxonid=1363

It fetches subsequent 10 records after first 10 have been fetched.

To reciprocate

Run

from pyobis.checklist import ChecklistQuery
ChecklistQuery().list(taxonid=1363)["total"] # total records
len(ChecklistQuery().list(taxonid=1363)["results"]) # total fetched

Note: This is not something mentioned in the documentation, and I got this insight thanks to OBIS Mapper.

google colab links broken

the google colab links link to the wrong location

add occurrence downloads

add notebooks to docs

Is there a way to include the jupyter notebooks in the docs? A rendered output would be nice, but a list of links & short descriptions would be good too.

Add capability to query for multiple taxons using `taxonids` at the same time

Overview

At present all functions enabling querying through taxonid need it to be a valid integer (as listed in WoRMS). It is not possible to cherry-pick multiple taxonids through a single query.

It would be a powerful feature if we enable array handling of multiple taxonids as we do for scientificname. It will save both time and effort.

[Bug] pyobis.occurrences: all records not being returned by search()

Overview

The occurrences.search() function returns only the first set of records (limited to 10 rows) from OBIS. But, we need to fetch all occurrence records from the API.

Steps to reproduce this bug

Install pyobis

Run

res = occurrence.search() # we can also query for any particular scientificname but we get the same issue
df = pandas.DataFrame(res["results"])
print(len(df)) # it will show only 10, however the total number of records can be determined by executing res["total"]

add progress bar to downloads

A progress bar for paginated loading would be a nice features to add.

There are many existing packages for this listed on this stackExchange question. I would suggest we use one that is well maintained, popular, and uses only plain text.

If we would rather code our own, I like this solution as a starting point.

Add `.get_search_url()` function to all modules

For each of the modules, we should have the ability to return the API url that is configured with our identified search.

For example,

res = occ.search(taxonid="1363", startdepth=0, enddepth=30, geometry="POLYGON ((-180 -30, 180 -30, 180 30, -180 30, -180 -30))")

Then,

res.get_search_url()

would return

https://api.obis.org/v3/occurrence?taxonid=1363&startdepth=0&enddepth=30&geometry=POLYGON%20%28%28-180%20-30%2C%20180%20-30%2C%20180%2030%2C%20-180%2030%2C%20-180%20-30%29%29

This gives the user a way to investigate how their query might be not providing the expected responses.

Related to #69

occ.search(...,mof=True) & occ.search(...,mof=False) are inconsistent

The output of occ.search is structured differently if mof is True.

occ.search(scientificname="Egregia menziesii", hasextensions="MeasurementOrFacts")

occ.search(scientificname="Egregia menziesii", mof=True, hasextensions="MeasurementOrFacts")

When mof=False, the total number of results is included in a column, but when mof=True it is as if res = res["results"] has been performed.

The output should not change in this way when mof is True vs False.

occurrence record weirdnesses

While working on this notebook to analyze migration patterns of species over time and plot them over a world map, I found that many records contained unexpected eventDate (screenshot attached). Should we drop these fields?

[Bug] pyobis.occurrences: MoF throws error for species with no occurrence records

Overview

When accessing MoF records for species that do not have a corresponding occurrence records, we encounter an unexpected error.

Steps to reproduce

Install pyobis
Run

import pandas as pd
from pyobis import occurrences as occ
res = occ.search(taxonid=124705, mof=True, hasextensions='MeasurementOrFact')
pd.Concat(res)

Error

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    res = occ.search(taxonid=124705, mof=True, hasextensions='MeasurementOrFact')
  File "/mnt/d/programming/pyobis/pyobis/occurrences/occurrences.py", line 96, in search
    a = pd.merge(pd.DataFrame(out["results"]),mofNormalized,on='id',how='inner')
  File "/home/ayush/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 107, in merge
    op = _MergeOperation(
  File "/home/ayush/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 700, in __init__
    ) = self._get_merge_keys()
  File "/home/ayush/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1097, in _get_merge_keys
    right_keys.append(right._get_label_or_level_values(rk))
  File "/home/ayush/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 1840, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'id'

update readme badges and/or review coveralls coverage

look into coveralls coverage
- does it work w/ gh-actions?
- is coverage 62%? if so: how to make it 100%?
- maybe it just needs the badges changed?

add nodes method

dataset notebook to show of a data provider's data

grab datasets from list of ids:

datasets_of_interest = [
    "UUID_1",
    "UUID_2",
]

spatial distribution
temporal distribution
taxanomic distribution
what MoFs are included?
what makes this dataset unique?

Add capability to download Full OBIS Export of presence records?

OBIS provides a full export of presence records [1] in both csv [2] and parquet [3]. Would it be feasible to add something to pyobis that goes out and efficiently grabs the whole export?

I'm sure folks smarter than me know of ways to parallelize the call or do something fancy with the available services to download in a reasonable time frame.

[1] - https://obis.org/data/access/
[2] - https://obis-datasets.ams3.digitaloceanspaces.com/exports/obis_20220710.csv.zip
[3] - https://obis-datasets.ams3.digitaloceanspaces.com/exports/obis_20220710.parquet

Could also be a waste of time for this package.

Document units and direction for use of depth search parameters

If I wanted to do an occurrence search including a filter by depth. The docs don't currently tell me:

What units should I use?
What direction are we using?

For example, if I wanted to search between 10 and 50 meters depth, should I use:
start_depth = -10
end_depth = -50

Please update the docs to describe this.

To share some information, the Climate and Forecast conventions is what made me think of this nuance. The guidance is to specifically include units and direction with vertical coordinates. I don't think we need to handle conversions in the code, but we should be documenting what the expectation is.

data caching?

a nice feature to have would be to allow retention of cached data so that data does not need to be redownloaded.

Convert the readme from rst to markdown

@ayushanand18 is that something you want to do? Markdown is easier to maintain, read, and render pretty much anywhere. We will need to adjust the metadata here too.

Add capability to process DNADerivedData to `occurrences` module

Overview

The pyobis package in its current state does not support fetching DNADerivedData using the occurrences module.

I believe there can be two ways to implement it:

add it as a parameter in the search function like MeasurementOrFacts
create a new function for it say dna( self, **args, **kwargs ) and enable necessary processing and utility methods since search function is already heavy with so many features.

occurrence search using datasetid

I'm trying to rework an old notebook to use this package.

I have this piece of code:

from pyobis.occurrences import OccQuery

datasetid = '2ae2a2bd-8412-405b-8a9f-b71adc41d4c5'

occ = OccQuery()
dataset = occ.search(datasetid = datasetid)

but it didn't work - took to long.

Here are the expected details from my other process:
OBIS Dataset page: https://obis.org/dataset/2ae2a2bd-8412-405b-8a9f-b71adc41d4c5
API request: https://api.obis.org/v3/occurrence?datasetid=2ae2a2bd-8412-405b-8a9f-b71adc41d4c5
Found 698900 records.

create visualization module (or separate package)

pyobis.visualize
- create separate repo?
  - keeps pyobis dependencies light
  - makes viz libarary potentially usable w/ GBIF & other DwC data

text from proposal:

Developing an umbrella module inside the pyobis package that can be used directly
by the users to visualize data through existing modules. For eg., we can build a
top-level module, say pyobis.visualize which ingests JSON data from say
pyobis.occurrence and visualizes it based on some parameters like scientificname,
basisOfRecords, etc.

curated column subsets?

Working with occurrence data I feel overwhelmed by the number of columns.

Would it be a good idea to allow for easy subsetting of columns?

Here is some code in which I have done that:

# === select a subset of columns for improved table legibility :
shortened_df = df[[
    # time-related columns:
    "date_year", "endDayOfYear", "verbatimEventDate", "startDayOfYear", "dateIdentified", "eventTime", "date_mid",
    "eventDate", "month", "date_start", "date_end", "day", "year",
    # row identifier columns:
    "recordNumber", "ownerInstitutionCode", "parentEventID", "identifiedBy", "eventID", "collectionID", "organismID", "recordedBy", "datasetID", "category", "datasetName",
    "institutionCode", "occurrenceID", "collectionCode", "dataset_id", "id", "modified", "catalogNumber", "fieldNumber",
    "institutionID",
    # additional remarks:
    "occurrenceRemarks", "taxonRemarks", "eventRemarks", "samplingProtocol",
    "typeStatus", "preparations", "establishmentMeans", "dynamicProperties", "type",
    # occurrence specifics:
    "individualCount", "occurrenceStatus", "originalScientificName", "absence",
    "terrestrial", "basisOfRecord", "dropped",
]]

We could include arrays of curated column lists so that a user could do something like:

df = df[pyobis.column_subset.taxonomic + pyobis.column_subset.temporal]

To drop everything but the curated list of "taxonomic" and "temporal" columns. Thoughts?

Undocumented `countryid` parameter for `occurrences/grid` endpoint

Overview

While studying the network logs for OBIS mapper tool, I witnessed that there is an undocumented input parameter countryid is being used to gather occurrence records published by a particular country with the help of /occurrence/grid endpoint of the OBIS API. This parameter is absent from the Swagger docs.

Screenshot of the network log

Response

It returns all the occurrence records for a particular country from its countryid code.

Questions

Should we add this parameter to pyobis

create conda-forge release

in addition to #17, we can publish to conda-forge. We just need to find or write a suitable github action.

Maintenance and pyobis future

Hi @sckott, are you still interested in maintaining pyobis? There is a growing community of Pythonistas that are using OBIS and would like to update and curate this package. Would you be interested in adding more committers and/or moving it to another organization?

fix pre-commit failings

create 1st demo jupyter notebook

what are "use cases" for each notebook?
- outline different use-cases for pyobis
search for relevant papers using OBIS|GBIF data & reproduce results in jupyter
see how species are moving downwards due to climate change.
- search MoF for depth measurements & report on % that have it.
- relevant paper
see how species are shifting north/south due to climate change.
can potentially publish notebooks to JOSS
we should also set up these notebooks to work with mybinder.org
should the notebooks go in this repo or should we place them elsewhere?
- create a "notebooks" directory in the root dir
add this notebook

ongoing sunfish analysis in dev on this branch

fill out test suite

add checklist method

Use CI push to PyPI

Use CI push to PyPI
- discussed w/ Filipe : used to do it w/ Travis CI, need to switch to gh-actions

Expand README to include more information

Overview

Since pyobis is an open-source project, there is a strong need to expand the README to include information related to, but not limited to:

Data Quality Issues reporting: If any user finds some issues with the data grabbed from pyobis it will be a natural instinct for him/her to open issues here. Therefore, README must contain relevant information regarding the right place to open issues.
Data Related Queries: If any user has some queries related to the data being returned by pyobis say pertaining to the fields returned with occurrence records then it might be wise to include links to the OBIS main page, contact page or some forum.
Update short examples contained in the README
Add link to usage_guide.ipynb and to /notebooks/ folder for sample use case-analysis
Other changes: Changing the links that have gone obsolete, adding Further Reading material, adding link to CONTRIBUTING.md guide.

Please suggest if there are any other things that need to be added to the README.

create CONTRIBUTING.md

create CONTRIBUTING.md

/docs/ directory update (or remove?)

/docs/ looks to contain sphinx documentation. I am not familiar with building documentation like this. Can we add the docs build to a github action? Should we update to another tooling?

[Bug] pyobis.occurrences: MoF results DataFrame shows duplicate columns.

Overview

When we query MoF records using occurrences.search(mof=true,hasextensions='MeasurementOrFact'), it throws some duplicate columns -> ['scientificName', 'eventID']. This issue is a direct result of the implementation at line 95.

Steps to reproduce

Install pyobis

Run

df = pd.Concat(occurrences.search(mof=True, hasextensions='MeasurementOrFact')
df.columns # it will show two duplicate columns with the same content but with the name: scientificName_x and scientificName_y. Similar result with eventID

GSoC meeting planning & agenda

In June we will begin a set of weekly meetings to work on a project supported by Google Summer of Code. This issue is here to discuss plans for the first of these meetings. Expected attendees include me (Tylar Murray), Matt (@MathewBiddle), and Ayush (@ayushanand18). Let's discuss scheduling on the slack #gsoc_pyobis channel.

agenda

introductions
- what you are hoping to get out of involvement in this project
discuss TDD as described in the workflow 1-pager
begin working on strategic plan as described in the management 1-pager
set up todo items before next meeting

improve MoF accessiblity

GSoC objective

Make MoF response more efficient
- make it return a list of dataframes so it's more usable
- currently nested table each occurrence has nested measurment or fact
- some trick to use pandas to pull out the nested part (json.normalize?)

iobis / pyobis Goto Github PK

pyobis's Introduction

pyobis

What is it?

Main Features

Installation

Install from PyPI

Install from conda-forge

Install latest development version from GitHub

Documentation

Library API

Usage Guide

Sample analysis

Meta

Further Reading

pyobis's People

Contributors

Stargazers

Watchers

Forkers

pyobis's Issues

Overview

Overview

Notes

Questions

Overview

To reciprocate

Overview

Overview

Steps to reproduce this bug

Overview

Steps to reproduce

Error

Overview

Overview

Screenshot of the network log

Response

Questions

Overview

Overview

Steps to reproduce

agenda

Recommend Projects

Recommend Topics

Recommend Org