Giter VIP home page Giter VIP logo

Comments (13)

rybesh avatar rybesh commented on September 26, 2024

For now, my plan is to start with the Admin 0 – Countries shapefiles from Natural Earth and manually reconcile these with Wikidata URIs (since Natural Earth does not publish linked data). Then, I will map from our existing DBpedia URIs to Wikidata, and replace the DBP URIs in our dataset with WD ones (since the latter are better connected to other standard identifiers). However, that only handles spatial coverage at the country level. Since the DINAA periods are at the US state level, I could do something similar with the Admin 1 – States, Provinces dataset from Natural Earth (but I would only do this for the US, not the entire world). Beyond that, though, I don't know what to do—I was hoping that there would be more gazetteers out there publishing linked data with shape files by now, but there aren't. At one point Patrick had suggested that rather than indicating spatial coverage by choosing places from a gazetteer, we could just have editors drag on a map to select a bounding box… maybe that is the way to go?

from periodo-data.

rybesh avatar rybesh commented on September 26, 2024

Another possibility would be to forget about using shaped borders for countries and to just use bounding boxes, which Wikidata does provide.

from periodo-data.

rybesh avatar rybesh commented on September 26, 2024

Correction—Wikidata provides bounding boxes for most countries, but not for US states.

from periodo-data.

ptgolden avatar ptgolden commented on September 26, 2024

For now, my plan is to start with the Admin 0 – Countries shapefiles from Natural Earth and manually reconcile these with Wikidata URIs (since Natural Earth does not publish linked data).

Where will you publish this? It might make a good npm package

from periodo-data.

ptgolden avatar ptgolden commented on September 26, 2024

Correction—Wikidata provides bounding boxes for most countries, but not for US states.

Bounding boxes should be easy enough to calculate for lat-long points, no?

from periodo-data.

rybesh avatar rybesh commented on September 26, 2024

OK, so after further experimentation and thinking, here is what I propose:


tl;dr: Instead of linking each period definition to multiple spatial resources in external gazetteers, we define our own custom spatial resources. Our resources will simply be labeled bounding boxes defined as four points (northernmost point, easternmost point, etc.—Wikidata has properties for these that we can reuse). Each of our period definitions will have a single one of these spatial resources as the value of its dcterms:spatial property.


The advantage of this is that it relieves us from having to carefully curate which external gazetteer resources we will support as spatial coverage values. The major disadvantage is that we would lose the links we have to those external gazetteers—but it's not clear to me whether anyone is using those links anyway.

Reconciliation involving location should not be affected, as we can concatenate all the labels from the current (external) spatial resources into a single label for our new (internal) spatial resource. And if we have bounding boxes, we can support reconciliation using lat/long as well.

Converting our current (lists of) spatial coverage values should not be a problem. For entering new data, we can enable dragging on a map to select a bounding box, and show any already-defined spatial resources that are close to it. If none are adequate, we can use the bounding box they defined as the basis of a new resource, and generate a label for it by finding all the bounding boxes of modern countries that it intersects, and concatenating their labels. (For bounding boxes within the US, we can do the same with US states.)

Thoughts?

from periodo-data.

atomrab avatar atomrab commented on September 26, 2024

Is there any reason we can't have our cake and eat it too -- that is, follow your original workflow to get a set of geometries for the Natural Earth boundaries for countries and US states, and then align them to Wikidata (maybe along with a limited number of custom boundaries for some major historical entities that we've used a lot -- the Roman Empire ca 2nd c. AD, the Byzantine Empire ca. 1100 AD, the Holy Roman Empire in the 13th c., the Austro-Hungarian Empire and the Ottoman Empire in the 19th c., maybe the Soviet Union ca. 1950 -- I've got some ideas about shapefiles for a few of these), but also allow a bounding box that would produce coordinate entities we'd curate ourselves? That is, a user entering data would choose between entering a spatial value via country lookup, and entering one via a bounding-box interface.

I may be misunderstanding the extent of the loss here, but I would hate to sacrifice a URI-based connection to stable external resources -- after all, this might give people a pathway to find PeriodO resources associated with one of those resources down the road. I also worry a little about cluttering search results -- the bounding-boxes Pleiades inherited from the Barrington Atlas produce a ton of false positives.

Should we ask Tom Elliott and Rainer Simon for their thoughts on this?

from periodo-data.

rybesh avatar rybesh commented on September 26, 2024

We could still link our bounding boxes to resources in external gazetteers, I suppose, and I agree that there's no good reason not to do that. What I'm doubting is the need for us to maintain our own detailed boundary shapes, when all we really need is a rough approximation of location.

To take a couple of concrete examples:

  1. The Pleiades Roman, early Empire period has as its spatial coverage the DBpedia entities for Afghanistan, Albania, Algeria, Andorra, Armenia, Austria, Azerbaijan, Bahrain, Bangladesh, Belgium, Bosnia and Herzegovina, Bulgaria, China, Croatia, Cyprus, Czech Republic, Democratic Republic of the Congo, Denmark, Djibouti, Egypt, Eritrea, Ethiopia, France, Georgia, Germany, Gibraltar, Greece, Guernsey, Hungary, India, Iran, Iraq, Ireland, Isle of Man, Israel, Italy, Jordan, Kazakhstan, Kenya, Kosovo, Kuwait, Kyrgyzstan, Latvia, Lebanon, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Malta, Moldova, Monaco, Montenegro, Morocco, Myanmar, Nepal, Netherlands, Norway, Oman, Pakistan, Palestine, Poland, Portugal, Republic of the Congo, Romania, Russia, Saudi Arabia, Serbia, Slovakia, Slovenia, Somalia, Spain, Sri Lanka, Sudan, Sweden, Switzerland, Syria, Tajikistan, Tunisia, Turkey, Turkmenistan, Uganda, Ukraine, United Arab Emirates, United Kingdom, Uzbekistan, Vatican City, and Yemen. That seems less usable to me (and far more verbose) than just saying that it covers a broad region from this latitude to that latitude and this longitude to that longitude. Plus, do we really need to be dealing with the URIs for and boundaries of the Isle of Man and Liechtenstein and Guernsey, just to give a rough approximation of where the Roman Empire was? And would a detailed shapefile for the early Roman Empire really serve our purposes better than a boundary box?

  2. The Classical Iberian Period has a spatial coverage description of “Catalan area”, but we just link it to Spain. It seems more correct to say something like “roughly this box within Spain.” The latter also seems preferable to taking on the task of finding and maintaining historical shapefiles for autonomous regions of Spain (that's @kgeographer's job 😄).

Part of the reason I'm concerned about this is bloating the data we need to manage: a bounding box is four numbers, while a boundary polygon can be thousands. This has consequences for the performance of the client.

from periodo-data.

atomrab avatar atomrab commented on September 26, 2024

I see your point. I think bounding-boxes should work fine for searches, though I'd suggest that we display a map with current national boundaries if possible (I'm thinking of students who don't know where, say, Greece is, and who will draw giant boxes that will return everything). I don't want to sacrifice our links to external spatial gazetteers if we can help it. And I don't want to force people who would rather select "France" from a drop-down list to have to draw a bounding-box instead, since then we'll get all kinds of differently-sized boxes.

So is the following how we're going to proceed?

  1. Match our current countries, US states, and maybe a few frequent historical entities (max extents of the Roman Empire, Byzantine Empire, Holy Roman Empire, Ottoman Empire, Austro-Hungarian Empire, Soviet Union, "Europe") to a set of bounding boxes we maintain ourselves.
  2. Align those bounding-boxes to entities in an external gazetteer like Wikidata (do we need to say somewhere that they're just approximations?).
  3. Allow users entering data to either select a country from a pull-down menu (which would be expressed geometrically with the relevant preset bounding box) or draw a bounding-box on a map to set spatial coverage.

For 3), would the user-defined bounding box become a new entity in the dataset each time, or would it be parsed in terms of the existing country bounding-boxes it overlaps, or both? We have a bunch of "Greece except for Crete" entries, and a bunch of "Crete" ones -- would they all have the same bounding-box (for Greece), or would we generate a bounding box for "Crete" as well, and distinguish it from Greece?

from periodo-data.

atomrab avatar atomrab commented on September 26, 2024

It occurs to me -- if we're not requiring polygons for external gazetteer entries, couldn't we have an option to both draw a bounding box and then hit Wikidata directly for the concept? So for example your coverage is "Crete", you draw a box around "Crete", and then look up Crete in a field that pulls https://www.wikidata.org/wiki/Q34374? This puts the spatial burden on the user without losing the linking, right?

from periodo-data.

kgeographer avatar kgeographer commented on September 26, 2024

from periodo-data.

rybesh avatar rybesh commented on September 26, 2024

I'd suggest that we display a map with current national boundaries if possible (I'm thinking of students who don't know where, say, Greece is, and who will draw giant boxes that will return everything). I don't want to sacrifice our links to external spatial gazetteers if we can help it. And I don't want to force people who would rather select "France" from a drop-down list to have to draw a bounding-box instead, since then we'll get all kinds of differently-sized boxes.

To clarify, what I was imagining was a UI component with both an autocomplete input and a zoomable global map. Editors could either enter a string or select a region on the map to search our existing labeled bounding boxes. They would only create a new bounding box if they didn't find anything suitable. So there should only ever be one bounding box labeled “France” (though there might be another one labeled “France, Belgium, and Luxembourg”). Given that we will be creating a bunch of bounding boxes for our existing spatial coverage values, creating new ones should be relatively rare.

For when we do need to create new bounding boxes, we would have a separate interface for authoring them. The author would again select a region on the map, but instead of treating that selection as a query over the existing bounding boxes, it would be used as the basis for a new one. We could then use the indexed bounding boxes (or polygons, we'd need to experiment) of modern countries and US states as a way to generate a useful suggested label for the box, so e.g. if one drew a box containing Albania, Macedonia, Bulgaria, and Greece we would suggest the label “Albania, Macedonia, Bulgaria, and Greece”. We'd allow the label to be edited if there were a more appropriate label such as “Byzantine Empire.”

Match our current countries, US states, and maybe a few frequent historical entities (max extents of the Roman Empire, Byzantine Empire, Holy Roman Empire, Ottoman Empire, Austro-Hungarian Empire, Soviet Union, "Europe") to a set of bounding boxes we maintain ourselves.

Pretty much, except I was imagining that in cases where we currently have multiple linked spatial entities (e.g. Israel, Lebanon, Syria, and Jordan) we would have a single bounding box containing all of them and labeled as “Israel, Lebanon, Syria, and Jordan”. This would be a distinct entity from e.g. the bounding box containing only Israel and labeled as “Israel”.

Align those bounding-boxes to entities in an external gazetteer like Wikidata (do we need to say somewhere that they're just approximations?).

Yes, we could link the aforementioned “Israel, Lebanon, Syria, and Jordan” box to the four corresponding Wikidata entities with a contains predicate.

We have a bunch of "Greece except for Crete" entries, and a bunch of "Crete" ones -- would they all have the same bounding-box (for Greece), or would we generate a bounding box for "Crete" as well, and distinguish it from Greece?

This approach would enable us to have separate “Greece” and “Greece except for Crete” bounding boxes. The latter would have to be a manually edited label as the auto-suggested label for both would simply be “Greece”.

from periodo-data.

rybesh avatar rybesh commented on September 26, 2024

@ptgolden If you have any thoughts on the above, please chime in.

from periodo-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.