vertnet / dwcvocabs Goto Github PK
View Code? Open in Web Editor NEWReal-world values for Darwin Core terms
License: GNU General Public License v2.0
Real-world values for Darwin Core terms
License: GNU General Public License v2.0
A "smart step" needs to be updated to permit "Maryland" appear in the vocabularies when it is given as data via verbatimStateProvince.
Change geography resolution to NOT include more than one value in these fields.
Latest recommendation about how to populate stateProvince and county fields when there is more than one value is to leave it BLANK and to capture this in the locality field, prepend to whatever is there already.
See:
https://github.com/tdwg/dwc-qa/wiki/Multiple-values-in-one-field
In Vocabularies folder in Dropbox.
Also, please change that you have incorporated the UMZM vocabs still in that folder. I believe you did, but if not, merge those, too.
Then delete 'em.
Changes were made in 2015. See https://en.wikipedia.org/wiki/List_of_districts_of_Nepal
Especially basisOfRecord. This will enable mapping to RDF or OWL where the URI is the functional identifier rather than the label.
All in Dropbox>Vocabularies.
These include past vocabs that did not merge completely as well as several new sets from recent datasets.
We are not using this. If one wanted to know if it is standard or not, one could check that the verbatim fields are the same as the standardized fields.
See
https://en.wikipedia.org/wiki/List_of_transcontinental_countries
https://en.wikipedia.org/wiki/Boundaries_between_the_continents_of_Earth#The_Americas,_Australia_and_Oceania
https://en.wikipedia.org/wiki/List_of_transcontinental_countries#/media/File:VE-Dependencias_Federales_ubicacion.png
https://en.wikipedia.org/wiki/Boundaries_between_the_continents_of_Earth#/media/File:Map_of_Sunda_and_Sahul.png
https://en.wikipedia.org/wiki/Federal_districts_of_Russia
https://en.wikipedia.org/wiki/European_Russia#Alignment_with_administrative_divisions
a compilation of lists:
https://github.com/iDigBio/idb-backend/blob/43130d8ad65867436d8f0ba7b7ddd6ee0fc4de44/idb/data_tables/locality_data.py
comprises:
2-3 char ISO countryCode mapping
'none' options
continent
country
stateProvince
major bodies of water
country string to ISO code
https://doi.pangaea.de/10.1594/PANGAEA.777975
Also, Renzo provided shapes for the IHO waterbodies.
Oaxaca has 570 municipalities divided in districts. Revise Oaxaca geography to include districts at the county level and municipalities at the municipality level.
Since 2015 the first level is again the same 6 areas (now provinces) --- so the first level was current in GADM.
Subdivided into 24 prefectures and 117 districts (pretty much as it was, but with new level type names).
Hi, im a native spanish speaker, and i can add a csv file of the dwc terms based on your DarwinCloud.csv file or edit it as you wish.
i was searching and a lot of terms were missed, i' ll gladly help. greetings from chile
In Dropbox>Vocabularies:
ALMNH PaleoBotany
APSU MemphisHerps
SIO MarineVerts
The SIO data is a priority for me so that I can rerun this large data set against the newest vocabs.
This table is used to automatically fill in county-level information from the combination of country, stateProvince, and verbatimCounty for combinations that have been seen and checked before. As of today 1266 combinations outside of North America need to be checked.
For records that have verbatimIsland or verbatimIslandGroup populated, transfer these values to island and islandGroup as appropriate, removing the word "Island" from the island field except where "Island" is an essential part of the name (e.g., "Long Island").
Geography for Russia is inconsistent in terms of the forms of names chosen. At times they are English interpretations, at times, transliterations of the Russian.
Make the error patterns consistent with documentation in README.md (https://github.com/VertNet/DwCVocabs/blob/master/README.md). It is also worth making the incorrectable and notHigherGeography fields complete and consistent.
...so that only Master needs to be downloaded each time. Right now the geog tables in Manager change with each new addition of a data set's vocabs.
Marine Regions has hierarchical waterbody information at http://www.marineregions.org/gazetteer.php. use this to make the waterbody field of the geography lookup asspecific as possible.
Female is not present in the mapping so I believe it is getting passed through as-is ("Female" gets published rather than "female").
Sample:
http://portal.vertnet.org/o/ku/kui?id=21259
Download:
Your VertNet download file is now available for a limited time at
https://storage.googleapis.com/vn-downloads2/vertnet_female_specimens-4ca385f9dd4c425d92ed7cf9151bd394.tsv.
Query: sex:female
Matching records: 61000
Request submitted: 2016-01-14T18:49:00.376040
Request fulfilled: 2016-01-14T19:04:16.854560
In Dropbox>Vocabularies.
To help clarify instances where multiple terms are possible, among other issues.
Example 1: North Atlantic Ocean vs Northern Atlantic Ocean vs Atlantic Ocean.
Example 2: The order in which tributaries are listed (alpha order or order of progression).
Happy to help.
Geography uses former provinces rather than current counties as first-level subdivisions.
There have been several questions about the geography, and the incompleteness of standardization contributes to the confusion.
Reconcile the geography in Vocabs against the standardization ReBioMa provided.
Question:
How do we complete the dwc:waterbody field if the waterbody is unnamed and the verbatim is "unnamed creek" or "unnamed tributary"? Do we treat this the same is if the verbatim contained "unknown"?
Currently, I am leaving "unnamed creek" and treating it in the same way I would treat a proper place name, in part because it implies the recorder might be away that the waterbody is, in fact, unnamed and NOT unknown.
Many are filled in already, but many are not. Can also add the uris for them.
For vetted standard english values, have standard values in other languages. Try to make this scalable to any number of languages.
There are still geography vocabs that give a list of named places separated by commas, such as "Wyoming, Montana". These should be omitted following the rule that the values must be standard values for a single administrative region.
Zip in Vocabularies<Dropbox.
Contains Mamm, Aves, at least one parasite, and maybe a fish or two.
Right now some of the suggestions made are erroneous. These need to be ferreted out and corrected in process.
Procure higher taxonomy from Frank Gill at IOC and replace higher taxonomy in LookupClassification where Genus matches, then do a similar cleanup on family, then order, etc.
When done, consider adding a avian_classification_resolution to the post-harvest-processor (https://github.com/VertNet/post-harvest-processor).
APSU Fish
APSU Herps
Both in Dropbox.
Posted to Dropbox > Vocabularies.
In Dropbox
The Gazetteer of aggregated georeferences has a large set of distinct higher geography. In anticipation of a locality service that includes these data and can resolve the higher geography to standardized values, add any new geography combinations to the LookupGeography table and resolve Canada, USA, and Mexico.
Run LookupGeography through OpenRefine to see what can be cleanup up quickly. Worked well with LookupClassification.
Locate and incorporate Worms taxonomy for all taxa.
Currently the vocabs are produced from content seen in VertNet migrators. Augment the vocabularies with values from VertNet that did not go through migrators.
Make recommendations for corrections via VertNet feedback mechanism.
Formalize the criteria along with the patterns for associated error values. Incorrectable should mean that there is an irresolvable inconsistency.
Is it worth adding a flag for "unable to locate"? Right now that information is in the error field only.
Should there be a flag for "there is more than one". Right now that information is in the error field only.
Should there be a flag for "no current equivalent administrative region"? Right now that might only be in the notHigherGeography field along with things that simply are not administrative regions.
Add flag for "verified"? This could act as a double-check, meaning that everything is complete and accurate according to "opinionBy" as of the opinionDate.
See georepository.com/crs_5527/SAD69-96.html
Updated today!
Homonyms have become a prevalent issue for SIB Colombia using the vocabs, so the proposed solution is to separate the Classifications for plants and animals to avoid the majority of these issues.
Solution is admittedly a band-aid, and use of a taxonomic lookup service is the better eventual solution to add outside the migrator.
Remove level with Munster, Leinster, etc. and use modern counties as first-level subdivisions, matching GADM.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.