Comments (13)
For now, my plan is to start with the Admin 0 – Countries shapefiles from Natural Earth and manually reconcile these with Wikidata URIs (since Natural Earth does not publish linked data). Then, I will map from our existing DBpedia URIs to Wikidata, and replace the DBP URIs in our dataset with WD ones (since the latter are better connected to other standard identifiers). However, that only handles spatial coverage at the country level. Since the DINAA periods are at the US state level, I could do something similar with the Admin 1 – States, Provinces dataset from Natural Earth (but I would only do this for the US, not the entire world). Beyond that, though, I don't know what to do—I was hoping that there would be more gazetteers out there publishing linked data with shape files by now, but there aren't. At one point Patrick had suggested that rather than indicating spatial coverage by choosing places from a gazetteer, we could just have editors drag on a map to select a bounding box… maybe that is the way to go?
from periodo-data.
Another possibility would be to forget about using shaped borders for countries and to just use bounding boxes, which Wikidata does provide.
from periodo-data.
Correction—Wikidata provides bounding boxes for most countries, but not for US states.
from periodo-data.
For now, my plan is to start with the Admin 0 – Countries shapefiles from Natural Earth and manually reconcile these with Wikidata URIs (since Natural Earth does not publish linked data).
Where will you publish this? It might make a good npm package
from periodo-data.
Correction—Wikidata provides bounding boxes for most countries, but not for US states.
Bounding boxes should be easy enough to calculate for lat-long points, no?
from periodo-data.
OK, so after further experimentation and thinking, here is what I propose:
tl;dr: Instead of linking each period definition to multiple spatial resources in external gazetteers, we define our own custom spatial resources. Our resources will simply be labeled bounding boxes defined as four points (northernmost point, easternmost point, etc.—Wikidata has properties for these that we can reuse). Each of our period definitions will have a single one of these spatial resources as the value of its dcterms:spatial
property.
The advantage of this is that it relieves us from having to carefully curate which external gazetteer resources we will support as spatial coverage values. The major disadvantage is that we would lose the links we have to those external gazetteers—but it's not clear to me whether anyone is using those links anyway.
Reconciliation involving location should not be affected, as we can concatenate all the labels from the current (external) spatial resources into a single label for our new (internal) spatial resource. And if we have bounding boxes, we can support reconciliation using lat/long as well.
Converting our current (lists of) spatial coverage values should not be a problem. For entering new data, we can enable dragging on a map to select a bounding box, and show any already-defined spatial resources that are close to it. If none are adequate, we can use the bounding box they defined as the basis of a new resource, and generate a label for it by finding all the bounding boxes of modern countries that it intersects, and concatenating their labels. (For bounding boxes within the US, we can do the same with US states.)
Thoughts?
from periodo-data.
Is there any reason we can't have our cake and eat it too -- that is, follow your original workflow to get a set of geometries for the Natural Earth boundaries for countries and US states, and then align them to Wikidata (maybe along with a limited number of custom boundaries for some major historical entities that we've used a lot -- the Roman Empire ca 2nd c. AD, the Byzantine Empire ca. 1100 AD, the Holy Roman Empire in the 13th c., the Austro-Hungarian Empire and the Ottoman Empire in the 19th c., maybe the Soviet Union ca. 1950 -- I've got some ideas about shapefiles for a few of these), but also allow a bounding box that would produce coordinate entities we'd curate ourselves? That is, a user entering data would choose between entering a spatial value via country lookup, and entering one via a bounding-box interface.
I may be misunderstanding the extent of the loss here, but I would hate to sacrifice a URI-based connection to stable external resources -- after all, this might give people a pathway to find PeriodO resources associated with one of those resources down the road. I also worry a little about cluttering search results -- the bounding-boxes Pleiades inherited from the Barrington Atlas produce a ton of false positives.
Should we ask Tom Elliott and Rainer Simon for their thoughts on this?
from periodo-data.
We could still link our bounding boxes to resources in external gazetteers, I suppose, and I agree that there's no good reason not to do that. What I'm doubting is the need for us to maintain our own detailed boundary shapes, when all we really need is a rough approximation of location.
To take a couple of concrete examples:
-
The Pleiades Roman, early Empire period has as its spatial coverage the DBpedia entities for Afghanistan, Albania, Algeria, Andorra, Armenia, Austria, Azerbaijan, Bahrain, Bangladesh, Belgium, Bosnia and Herzegovina, Bulgaria, China, Croatia, Cyprus, Czech Republic, Democratic Republic of the Congo, Denmark, Djibouti, Egypt, Eritrea, Ethiopia, France, Georgia, Germany, Gibraltar, Greece, Guernsey, Hungary, India, Iran, Iraq, Ireland, Isle of Man, Israel, Italy, Jordan, Kazakhstan, Kenya, Kosovo, Kuwait, Kyrgyzstan, Latvia, Lebanon, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Malta, Moldova, Monaco, Montenegro, Morocco, Myanmar, Nepal, Netherlands, Norway, Oman, Pakistan, Palestine, Poland, Portugal, Republic of the Congo, Romania, Russia, Saudi Arabia, Serbia, Slovakia, Slovenia, Somalia, Spain, Sri Lanka, Sudan, Sweden, Switzerland, Syria, Tajikistan, Tunisia, Turkey, Turkmenistan, Uganda, Ukraine, United Arab Emirates, United Kingdom, Uzbekistan, Vatican City, and Yemen. That seems less usable to me (and far more verbose) than just saying that it covers a broad region from this latitude to that latitude and this longitude to that longitude. Plus, do we really need to be dealing with the URIs for and boundaries of the Isle of Man and Liechtenstein and Guernsey, just to give a rough approximation of where the Roman Empire was? And would a detailed shapefile for the early Roman Empire really serve our purposes better than a boundary box?
-
The Classical Iberian Period has a spatial coverage description of “Catalan area”, but we just link it to Spain. It seems more correct to say something like “roughly this box within Spain.” The latter also seems preferable to taking on the task of finding and maintaining historical shapefiles for autonomous regions of Spain (that's @kgeographer's job
😄 ).
Part of the reason I'm concerned about this is bloating the data we need to manage: a bounding box is four numbers, while a boundary polygon can be thousands. This has consequences for the performance of the client.
from periodo-data.
I see your point. I think bounding-boxes should work fine for searches, though I'd suggest that we display a map with current national boundaries if possible (I'm thinking of students who don't know where, say, Greece is, and who will draw giant boxes that will return everything). I don't want to sacrifice our links to external spatial gazetteers if we can help it. And I don't want to force people who would rather select "France" from a drop-down list to have to draw a bounding-box instead, since then we'll get all kinds of differently-sized boxes.
So is the following how we're going to proceed?
- Match our current countries, US states, and maybe a few frequent historical entities (max extents of the Roman Empire, Byzantine Empire, Holy Roman Empire, Ottoman Empire, Austro-Hungarian Empire, Soviet Union, "Europe") to a set of bounding boxes we maintain ourselves.
- Align those bounding-boxes to entities in an external gazetteer like Wikidata (do we need to say somewhere that they're just approximations?).
- Allow users entering data to either select a country from a pull-down menu (which would be expressed geometrically with the relevant preset bounding box) or draw a bounding-box on a map to set spatial coverage.
For 3), would the user-defined bounding box become a new entity in the dataset each time, or would it be parsed in terms of the existing country bounding-boxes it overlaps, or both? We have a bunch of "Greece except for Crete" entries, and a bunch of "Crete" ones -- would they all have the same bounding-box (for Greece), or would we generate a bounding box for "Crete" as well, and distinguish it from Greece?
from periodo-data.
It occurs to me -- if we're not requiring polygons for external gazetteer entries, couldn't we have an option to both draw a bounding box and then hit Wikidata directly for the concept? So for example your coverage is "Crete", you draw a box around "Crete", and then look up Crete in a field that pulls https://www.wikidata.org/wiki/Q34374? This puts the spatial burden on the user without losing the linking, right?
from periodo-data.
from periodo-data.
I'd suggest that we display a map with current national boundaries if possible (I'm thinking of students who don't know where, say, Greece is, and who will draw giant boxes that will return everything). I don't want to sacrifice our links to external spatial gazetteers if we can help it. And I don't want to force people who would rather select "France" from a drop-down list to have to draw a bounding-box instead, since then we'll get all kinds of differently-sized boxes.
To clarify, what I was imagining was a UI component with both an autocomplete input and a zoomable global map. Editors could either enter a string or select a region on the map to search our existing labeled bounding boxes. They would only create a new bounding box if they didn't find anything suitable. So there should only ever be one bounding box labeled “France” (though there might be another one labeled “France, Belgium, and Luxembourg”). Given that we will be creating a bunch of bounding boxes for our existing spatial coverage values, creating new ones should be relatively rare.
For when we do need to create new bounding boxes, we would have a separate interface for authoring them. The author would again select a region on the map, but instead of treating that selection as a query over the existing bounding boxes, it would be used as the basis for a new one. We could then use the indexed bounding boxes (or polygons, we'd need to experiment) of modern countries and US states as a way to generate a useful suggested label for the box, so e.g. if one drew a box containing Albania, Macedonia, Bulgaria, and Greece we would suggest the label “Albania, Macedonia, Bulgaria, and Greece”. We'd allow the label to be edited if there were a more appropriate label such as “Byzantine Empire.”
Match our current countries, US states, and maybe a few frequent historical entities (max extents of the Roman Empire, Byzantine Empire, Holy Roman Empire, Ottoman Empire, Austro-Hungarian Empire, Soviet Union, "Europe") to a set of bounding boxes we maintain ourselves.
Pretty much, except I was imagining that in cases where we currently have multiple linked spatial entities (e.g. Israel, Lebanon, Syria, and Jordan) we would have a single bounding box containing all of them and labeled as “Israel, Lebanon, Syria, and Jordan”. This would be a distinct entity from e.g. the bounding box containing only Israel and labeled as “Israel”.
Align those bounding-boxes to entities in an external gazetteer like Wikidata (do we need to say somewhere that they're just approximations?).
Yes, we could link the aforementioned “Israel, Lebanon, Syria, and Jordan” box to the four corresponding Wikidata entities with a contains
predicate.
We have a bunch of "Greece except for Crete" entries, and a bunch of "Crete" ones -- would they all have the same bounding-box (for Greece), or would we generate a bounding box for "Crete" as well, and distinguish it from Greece?
This approach would enable us to have separate “Greece” and “Greece except for Crete” bounding boxes. The latter would have to be a manually edited label as the auto-suggested label for both would simply be “Greece”.
from periodo-data.
@ptgolden If you have any thoughts on the above, please chime in.
from periodo-data.
Related Issues (20)
- Fix broken British Museum URLs
- Dan Hick's massive thread of period names
- Marine isotope stages (MIS) HOT 1
- Fix mis-encoded character and capitalization of authority title HOT 3
- Period missing an English label HOT 2
- Character problem with Latin rendering of Russian diacritics from LCHS HOT 17
- Duplicate LCSH periods HOT 1
- Inconsistency in storing a source's publication date
- Authorities missing source publication dates HOT 1
- Submitting an authority that is a new edition of an old source HOT 5
- Buddhist Studies Time Authority Database HOT 1
- Taiwanese periods from Academica Sinica
- Seshat Databank HOT 1
- eHRAF Archaeology database HOT 1
- chronOntology data HOT 1
- Dyson, Stephen L. — Late Iron Age stop date and alt. labels
- Mapping graphs for Chronontology and Wikidata HOT 1
- Getty ATT periods as interpreted by DAINST in Chronontology HOT 5
- Revisit the way temporal extents are modeled
- Replace OCLC URIs with Wikidata ones HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from periodo-data.