Giter VIP home page Giter VIP logo

pyobistools's Introduction

pyobistools - Tools for data enhancement and quality control - but in Python

Python port of OBIS's obistools QC R package

Documentation available here

Other relevant OBIS Python modules:

  • pyobis - retrieving data from OBIS for consumption/usage in analyses - in Python

  • robis - retrieving data from OBIS for consumption/usage in analyses - In R

  • obis-qc - database-wide quality control of OBIS data held at OBIS - in R

pyobistools's People

Contributors

beauvilliers avatar jdpye avatar softwaremonk avatar sauve avatar kwilcox avatar jeffcullis avatar germainsauve avatar naomitress avatar

Stargazers

Sean Jungbluth avatar Jose Beltran avatar

Watchers

James Cloos avatar  avatar  avatar  avatar Joel Friesen avatar Mathew Biddle avatar  avatar Stéphane Lapointe avatar

pyobistools's Issues

Logos in README for OGSL / DFO

Crediting OGSL and DFO with the initial effort should come with some visual aid, perhaps we should add their logos to the README especially if it's not too big a file for either logo

check_scientificname_and_ids itis_usage update idea

When a user uses 'names_taxons_ids' in check_scientificname_and_ids and obtain a result from Worms database for a given row that has an Itis LSID, it will return as a no match because the validation will be done comparing a Worms LSID to an Itis LSID even though the Itis LSID value could be valid.

Setup linting / testing infrastructure

Tracking the testing infrastructure setup using pytest.

  • Standardize on linting rules, setup configuration files in repository
  • Notes in docs/README on how to setup and run the tests/linting locally
  • Running tests/lints using GH Action on each Push/PR

Migration of logic functions into the pyobistools module folder

The aim is to have functions that perform roughly the same tasks as the R package functions in the pyobistools module, with implementations / decorators using those functions and still performing the same tasks they were already doing seamlessly.

First order functions to write/migrate:

taxa.py and validations/check_scientificname_and_ids.py have overlapping functionality

Looks like the OBIS / WoRMS checks are replicated across these two files.

taxa.py works against obis's API, but doesn't actually implement anything to do with ITIS, just has an empty placeholder function there.

I think taxa.py tries to name things and put things where a R obistools user might expect them, but that the check_scientificname_and_ids.py implementation is far more advanced and actually functions.

Suggestions on how to move forward? We could migrate the functionality from the check_scientificname_and_ids.py into the naming scheme from taxa.py , rename the check_scientificname_and_ids.py functions in-place, or decide not to be congruous with the R implementation, as we please.

Register dependencies with PyPI

pyobis and xylookup aren't registered with any package index, it would benefit us when using them as dependencies to get them registered into places they can be installed automatically.

modify check_scientificname_and_ids to accomodate Worms long response

When querying Worms with a scientific name, the answer might might be longer than 1. Currently, the function returns the first answer. In the updated code, the functions should check the status of the answer it returns to make sure it is accepted.

A check can be made with scientific name 'Monstrilla grandis' to see how the function currently works.

Support the latest stable Pandas 2.x

This will be a bit of an ongoing saga, but there are tests that do not behave as expected with latest Pandas. We need to find and pin a stable but fresh-er version of Pandas that doesn't break the current codebase.

If the Pandas expected behaviour changes, we will change with it at that point.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.