Giter VIP home page Giter VIP logo

Comments (8)

damianooldoni avatar damianooldoni commented on August 16, 2024 2

I extract here the most interesting part of the comment of @timadriaens in #270 (duplicate):

this should be done in such a way that it can be selected per dataset (i.e. user gets option to select from a list based on what is in that field for the datasets selected).

So, he proposes a dynamic filter which resticts the values of identificationVerificationStatus based on the values present in the selected datasets.

@niconoe: I think this is quite a huge change in the filter mechanism.

My idea: keep it as simple as possible.

First of all: explore! We first should get an idea about what we are speaking about. So, here you are: there are the values of identificationVerificationStatus for the data shown, i.e. the data downloaded from GBIF with valid coordinates.

identificationVerificationStatus n
approved on knowledge rules 173075
NA 109812
approved on photographic evidence 56510
unverified 34959
approved on expert judgement 22100
validated with document 6035
not validated 4011
validated on the basis of rules 1940
validated without document 1151
verified by experts 766
verified 367
validated on the basis of a document 213
validated without a document in support (expertise or additional informations) 100
under validation 50
validated on the basis of likelihood 18
Accepted 5

I have the feeling that we can group things quite easily:

identificationVerificationStatus value to show in filter
starts with approved, validated or accepted verified
NA not available (NA is also possible)
unverified, not validated unverified
any other value (at the moment no other values present) other

Notice that more rules we add (e.g. see comments starting from #43 (comment)), more difficult they are to maintain on the long term. In particular, mapping verification status based on the dataset the occurrence it belongs to can be dangerous as such data could change in the future.

My proposal is:

  1. easy to understand
  2. easy to document
  3. easy to detect if new values of identificationVerificationStatus are present: selecting other we get more than zero occurrences back
  4. easy to expand: we can even avoid the grouping I proposed and opt to show all 16 options (+ the other option for possible new values in the future) after all. but I prefer not doing so, as it could be overwhelming for the typical user

from gbif-alert.

timadriaens avatar timadriaens commented on August 16, 2024 1
else if  data provider is "DEMNA"
=> mark all occurrences as "validated"
else if dwc:identificationVerificationStatus is "Approved by expert judgement" or "Approved by autovalidation" or "Approved on photographic evidence"
=> mark the occurrence as validated 
if dwc:identificationVerificationStatus is "Unverified"
=> mark the occurrence as "non-validated"

from gbif-alert.

damianooldoni avatar damianooldoni commented on August 16, 2024

Nico, indeed. We should provide guidelines for making this field more or less a field with a controlled vocabulary. Howoever, as you say, there are datasets with releavant obs published outside RIPARIAS. Of course the user should be able to filter out these datasets. Still, I am afraid we need to have a human controlled mapping during the whole project for the values of this field, and so making a decision about what we consider a validated observation and what it is not. HYou can definitely assign me to this task, if we agree so.

from gbif-alert.

timadriaens avatar timadriaens commented on August 16, 2024

Hi, still think providing a filter on this is a good idea. Could we explore a bit the values across the different datasets? Also, I guess many do not have that field filled, in case they come from INBO we could probably consider them validated and feed that field at dataset level? We do know quite well how we mapped validation status of wnm.be data (validated based on evidence, based on probablility...), so that filter would already be useful as this concerns one of the biggest and most regularly republished datasets that is most relevant to the alerts. To note that iNaturalist only pushed validated ones to gbif so no problem there.

from gbif-alert.

niconoe avatar niconoe commented on August 16, 2024

Hi everyone, I agree showing/filtering per validation status would be nice, and there's no technical issue at all for that. But we need a clear-cut rule so the system can decide if a given observation is validated or not. The rule can be moderately complex, the important point being that it's non ambiguous and that it provides decent results for all target occurrences (top avoid confusing the users: misleading information is probably worse than no information). Here is a first draft based on the discussion above, please improve it. Once we have a consensus, it can be implemented on the alert tool:

if dataset is "iNaturalist" 
    => mark all occurrences as "validated"
else if  data provider is "INBO" 
    => mark all occurrences as "validated"
else if dwc:identificationVerificationStatus is "verified" or "1"  # we need a good consensus for a criteria like that, this is just an example 

    => mark the occurrence as validated  
else (by default) 
    => mark the occurrence as "non-validated"

In other words, I'd be happy to implement something like that, but there are questions to be solved on the "data front" before it can be done. Tell me what you think!

from gbif-alert.

timadriaens avatar timadriaens commented on August 16, 2024

maybe we should explore a matrix datasetName x identificationVerificationStatus so we can explore all current values (and NAs)

from gbif-alert.

mcoupremanne avatar mcoupremanne commented on August 16, 2024

@timadriaens for my2cents:

We typically have 3 types of validation for the DEMNA based databases (on evidence, expertise, or reliability). I tried to get homogenous vocabulary amongst our GBIF datasets but:
• For riparias we also include fresh records without validation. Otherwise the warning would not be early at all
• It does not concern data collected by other wallon partners
• Possible for me to adapt the vocabulary if we agree on controlled terms. It would be maybe easier for the info uptake and for future data sharing.

from gbif-alert.

timadriaens avatar timadriaens commented on August 16, 2024

I agree with this of course

from gbif-alert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.