globalnamesarchitecture / dwca_hunter Goto Github PK
View Code? Open in Web Editor NEWDownloads biodiversity resources from internet and converts them to DarwinCore Archive files
License: MIT License
Downloads biodiversity resources from internet and converts them to DarwinCore Archive files
License: MIT License
From gnames/gnverifier#62 by @abubelinha
WCVP (World Catalog of Vascular Plants)
website http://wcvp.science.kew.org/
dataset http://sftp.kew.org/pub/data-repositories/WCVP/
article https://doi.org/10.1038/s41597-021-00997-6
There are html entities instead of characters, long English annotations instead of synonyms
Database artifacts (any taxon name with the word 'artifact' in the unacceptability_reason field) are errors in ITIS, and I think they should be ignored when creating the DwC-A file.
hi @dimus et al.
Are you still maintaining globalnames.org ?
If so, I was hoping you can consider the following:
Plazi https://plazi.org keeps an extensive list of taxonomic literature and associated taxonomic names.
Plazi exports these taxonomic literature <> name links as DwC-A and register them with GBIF.
You can find their publications at https://www.gbif.org/occurrence/search?dataset_key=6384b520-7e9f-4874-a414-76c2e9b01d74&type_status=TYPE .
As a user (GloBI), I would like to be able use the Global Names resolvers to find taxonomic treatments in Plazi.
The taxonomic treatments can be located by linking the TaxonId fields that are available in the Plazi publications.
for example, when I lookup: Rhinolophus denti , I expect to find a Plazi name with id 885887A2FFC88A21F8B1FA48FB92DD65.taxon
(also see https://www.gbif.org/occurrence/2597533915). This identifier can then be translated into a link to the related taxonomic treatment via http://treatment.plazi.org/id/885887A2FFC88A21F8B1FA48FB92DD65 .
See also attached screenshots.
fyi @myrmoteras
@ccicero, @dustymc, I started new imports, and will notify you for every ticket, this way it will be easier for you to see their status.
IOC World Bird List - https://www.worldbirdnames.org/ioc-lists/master-list-2/
FYI @dustymc, @ccicero
AOS Checklist - http://checklist.americanornithology.org/taxa/
FYI @dustymc @ccicero
Clements Checklist - https://www.birds.cornell.edu/clementschecklist/download/
As a User I want to see the ASM Mammalian Diversity Database appear as a data source in https://resolver.globalnames.org and related services .
It appears that the https://mammaldiversity.org resource is being used among mammal researchers.
I was able to download attached dump mammal.json.gz
using curl 'https://mammaldiversity.org/species-account/api.php?q=*' -H 'User-Agent:' | gzip > mammal.json.gz
.
Note that the User-Agent was somehow needed.
There is a new version of Arctos available, we need to harvest it and give @dustymc feedback. See
ArctosDB/arctos#3205
In the synonym_links
file, a single TSN might be linked to multiple TSNs (see TSN 103337 for an example). It's not clear what the best DwC-A representation of this might be.
When inspecting the json results for matches of Enhydra lutris
against Mammal Species of the World (data source id 174), taxon_id 28576 and internal_id 14001090 is found (see below). However, it appears that the internal_id 14001090 is the identifier that the Mammal Species of the World exposes to link to their taxon pages (e.g., https://www.departments.bucknell.edu/biology/resources/msw3/browse.asp?id=14001090). For some reason, the taxon_id 28576 cannot be found in any of the Mammal Species of the World data products.
This suggests that the (current) internal_id are suitable to be used as taxon ids (incl. in path ids hierarchy).
data_source_id | 174 |
data_source_title | "The Mammal Species of The World" |
gni_uuid | "3096feea-1216-5f59-ab70-fcff3492cef6" |
name_string | "Enhydra lutris Linnaeus 1758" |
canonical_form | "Enhydra lutris" |
classification_path | "Mammalia|Carnivora|Caniformia|Mustelidae|Lutrinae|Enhydra|Enhydra lutris" |
classification_path_ranks | "class|order|suborder|family|subfamily|genus|species" |
classification_path_ids | "1|25367|27246|28538|28539|28575|28576" |
taxon_id | "28576" |
local_id | "14001090" |
edit_distance | 0 |
imported_at | "2018-08-04T20:50:18Z" |
match_type | 2 |
match_value | "Exact match by canonical form" |
prescore | "3|0|0" |
score | 0.988 |
29 |
At the moment, get_ranks uses the rank_id to store rank names. However, ITIS uses duplicate rank_ids to identify differently named ranks between kingdoms (e.g. "phylum" has kingdom_id=1, rank_id=30, while "division" has kingdom_id=3, rank_id=30). Using only rank_ids therefore forces everybody onto the ranks defined for kingdom Chromista (kingdom_id=6).
This should be a pretty easy fix: changing every piece of code that uses @Ranks to use both rank_id and kingdom_id when looking up the appropriate term. This should be pretty easy, as @Ranks is only used on three lines.
Paul Kirk sent a new dump of Index Fungorum, it has to be converted to DWCA and imported.
Import changed, it is now a csv file
@Archilegt wrote:
The data is downloadable in DwC. See the copy in GBIF for reference: http://www.gbif-uat.org/dataset/61e2d02a-34f7-4705-8840-c1ee49dfd951
@Archilegt, do you know if all your names get ingested into GBIF taxonomic backbone? If yes, I do get them already through GBIF darwin core file
dwcahunter list should return only resources that do convert
FYI @ccicero, @dustymc
Howard and Moore Checklist - https://www.howardandmoore.org/howard-and-moore-database/
@dimus
The Arctos community would like to explore the possibility of adding ectoparasite taxonomy in use by institutions participating in the Parasite Tracker TCN to Global Names. The TPT is working to assemble taxonomy reference files for major groups of ectoparasites, with the names and classifications in csv format. Please let me know what steps would be involved with integrating these with Global Names.
Looks like last harvest of wikispecies failed. They need to be reran again.
Requested at gnames/gnverifier#99
Change the project to be a gem
As @jhpoelen mentioned in #30, it is possible to get more metadata associated with names via https://github.com/plazi/treatments-rdf .
Reharvest Plazi using this source instead of http://tb.plazi.org/GgServer/xml.rss.xml
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.