periodo / periodo-data Goto Github PK
View Code? Open in Web Editor NEWTracking PeriodO data quality issues
Home Page: http://perio.do
License: The Unlicense
Tracking PeriodO data quality issues
Home Page: http://perio.do
License: The Unlicense
Need a VoID dataset description.
For period http://n2t.net/ark:/99152/p03wskdwdtr the spatial coverage description is Iran
but the spatial coverage is Palestine
. Seems wrong?
If it is wrong, we should probably do a review of all the Pleiades spatial coverages; I know we did this automatically and there may be other weird ones.
These period definitions have labels for their start and stop intervals, but are missing structured representations of one or the other. Was this an oversight, or done intentionally?
http://n2t.net/ark:/99152/p03tqbzrjp5
http://n2t.net/ark:/99152/p03tqbzvh6x
http://n2t.net/ark:/99152/p077fc5wvfd
http://n2t.net/ark:/99152/p083p5r6mhk
http://n2t.net/ark:/99152/p086kj96g7t
http://n2t.net/ark:/99152/p08m57h5rns
http://n2t.net/ark:/99152/p08m57hj8t4
http://n2t.net/ark:/99152/p08m57hmj8v
http://n2t.net/ark:/99152/p08m57htf9q
http://n2t.net/ark:/99152/p0bd664gkrp
http://n2t.net/ark:/99152/p0pqptcjxw3
http://n2t.net/ark:/99152/p0pqptcqkgd
http://n2t.net/ark:/99152/p0zj6g88v93
http://n2t.net/ark:/99152/p0zj6g8dzzb
http://n2t.net/ark:/99152/p0zj6g8hpfq
http://n2t.net/ark:/99152/p0zj6g8ktnf
http://n2t.net/ark:/99152/p0zj6g8p57k
http://n2t.net/ark:/99152/p0zj6g8tddc
http://n2t.net/ark:/99152/p0zj6g8ztv9
URLs to the Portable Antiquities Scheme collection definitions have broken as the result of a change to the PAS website. The original URLs, in the form http://finds.org.uk/database/terminology/period/id/21, lead to a 404; the current address is https://finds.org.uk/datalabs/terminology/period/id/21. Need to know if we should fix all of these by hand.
URIs of Wikidata items follow the pattern http://www.wikidata.org/entity/ID and not http://www.wikidata.org/wiki/ID (the later is the URI of the HTML document describing the item).
See https://www.wikidata.org/wiki/Wikidata:Data_access for more informations
For the most part, the LoC period subject headings refer to one and only one country/spatial entity. In the case of periods for the Austro-Hungarian Empire, the LoC has two sets: one for Austria, the other for Hungary. But the periods themselves are otherwise identical. Elsewhere, we have simply added two countries to the coverage. But these periods have separate URIs in the online LoC. Do we simply produce two separate entries, one for Austria and one for Hungary, following the LoC exactly? Or do we make a single entry that maps to two nations and two URIs?
I assume the former, but I just wanted to check.
(via @atomrab)
Would you mind taking a look at the rdf and/or ttl files in this folder: https://utexas.box.com/s/wtzn309lqo1aosp84nylndn0zumft3ro and letting me know if we can ingest the 2014 version programmatically, so that I don't have to add all of these by hand? I feel like this shouldn't be too hard to line up with our model, at least for someone who can actually write scripts, and it would save a tremendous amount of time. The folder also includes half a dozen older versions of the chronostratigraphic chart, which could be really interesting to visualize (but for the moment, I'd settle with having the current version).
In case these aren't already obvious, here are some observations about the rdf and ttl files:
The URIs, which do resolve properly, are in the form http://resource.geosciml.org/classifier/ics/ischart/Aeronian (though they resolve as eg http://vocabs.ands.org.au/repository/api/lda/csiro/international-chronostratigraphic-chart-2016/2016-12-v3/resource.html?uri=http://resource.geosciml.org/classifier/ics/ischart/Aeronian). These URIs, as far as I can tell, appear in the rdf representation but not in the ttl one (??).
The date-range is expressed in rdfs:comment
as "older bound-" (="start") and "younger bound-" (="stop"), with a +/- that can be incorporated into four-part dates. All these dates are in Ma (=megayear=one million Julian years=million years ago, usually with "present" as 1950; the date notation doesn't appear in the rdf/ttl, but it does in the pages that the URIs resolve to). So
<rdfs:comment xml:lang="en">older bound-439 +/-1.8</rdfs:comment><rdfs:comment xml:lang="en">younger bound-436 +/-1.9</rdfs:comment>
should be parsed as earliestStart:-440798050
(that is, 439ma plus 1.8ma before 1950), latestStart:-437198050
.
The alternate languages are expressed with two-character language codes, without script codes, but we could probably identify these manually for the non-Latin scripts (I know the Bulgarian is Cyrillic, but I can't identify the Chinese or Japanese character set off the top of my head).
I think we can use "World" as spatial coverage, at least for a start -- I have a query in with Denné about this.
There are sameAs
relations with dbpedia entries here -- should we try to capture those, and if so, how? Although the concepts are the same, the dates are sometimes different (eg http://dbpedia.org/resource/Aptian has 113 +/-1 Ma as the end date, but the corresponding entry in the dataset has 112 +/-1 Ma).
In the provenance graph we currently have, there are statements like this one:
"specializationOf": "http://n2t.net/ark:/99152/p086kj9kr9q",
"wasRevisionOf":
{
"id": "http://n2t.net/ark:/99152/p086kj9kr9q?version=0"
}
Couple questions:
Also, we're not currently including type
information (i.e. prov:Activity and prov:Entity). Was the on purpose? Are those implied by the relationships between things?
A number of the Ariadne definitions have the language tag ell-latn
on their preferred labels, but these labels are in Greek script, not Latin. @atomrab, can you verify that these are errors? Full list below.
http://n2t.net/ark:/99152/p0qhb6628nh
http://n2t.net/ark:/99152/p0qhb662trj
http://n2t.net/ark:/99152/p0qhb662z9q
http://n2t.net/ark:/99152/p0qhb6633fn
http://n2t.net/ark:/99152/p0qhb6642m4
http://n2t.net/ark:/99152/p0qhb6643mc
http://n2t.net/ark:/99152/p0qhb66472n
http://n2t.net/ark:/99152/p0qhb664f27
http://n2t.net/ark:/99152/p0qhb664td2
http://n2t.net/ark:/99152/p0qhb665dff
http://n2t.net/ark:/99152/p0qhb665vpv
http://n2t.net/ark:/99152/p0qhb6668db
http://n2t.net/ark:/99152/p0qhb666rpb
http://n2t.net/ark:/99152/p0qhb666zg4
http://n2t.net/ark:/99152/p0qhb667873
http://n2t.net/ark:/99152/p0qhb667vht
http://n2t.net/ark:/99152/p0qhb6695gs
http://n2t.net/ark:/99152/p0qhb669r6t
http://n2t.net/ark:/99152/p0qhb66b989
http://n2t.net/ark:/99152/p0qhb66fv6h
http://n2t.net/ark:/99152/p0qhb66g38m
http://n2t.net/ark:/99152/p0qhb66hcfd
http://n2t.net/ark:/99152/p0qhb66hxr7
http://n2t.net/ark:/99152/p0qhb66jcv6
http://n2t.net/ark:/99152/p0qhb66jh4z
http://n2t.net/ark:/99152/p0qhb66jpqb
http://n2t.net/ark:/99152/p0qhb66jr4s
http://n2t.net/ark:/99152/p0qhb66k3z7
http://n2t.net/ark:/99152/p0qhb66k7q6
http://n2t.net/ark:/99152/p0qhb66kwv9
http://n2t.net/ark:/99152/p0qhb66mj5z
http://n2t.net/ark:/99152/p0qhb66p2j3
http://n2t.net/ark:/99152/p0qhb66p53z
http://n2t.net/ark:/99152/p0qhb66ps4x
http://n2t.net/ark:/99152/p0qhb66ps4x
http://n2t.net/ark:/99152/p0qhb66q6ff
http://n2t.net/ark:/99152/p0qhb66s4gn
http://n2t.net/ark:/99152/p0qhb66s75r
http://n2t.net/ark:/99152/p0qhb66sctf
http://n2t.net/ark:/99152/p0qhb66sm2q
http://n2t.net/ark:/99152/p0qhb66sspc
http://n2t.net/ark:/99152/p0qhb66szh6
http://n2t.net/ark:/99152/p0qhb66tdgx
http://n2t.net/ark:/99152/p0qhb66v39c
http://n2t.net/ark:/99152/p0qhb66v534
http://n2t.net/ark:/99152/p0qhb66vjx3
http://n2t.net/ark:/99152/p0qhb66vqkd
http://n2t.net/ark:/99152/p0qhb66wcdj
http://n2t.net/ark:/99152/p0qhb66wrq3
http://n2t.net/ark:/99152/p0qhb66x582
http://n2t.net/ark:/99152/p0qhb66x8n3
http://n2t.net/ark:/99152/p0qhb66xcnt
http://n2t.net/ark:/99152/p0qhb66xjj8
http://n2t.net/ark:/99152/p0qhb66z499
A number of the Ariadne and FASTI definitions have the language tag als-latn
on their preferred labels. als
is the language code for Tosk Albanian, “the southern dialect group of the Albanian language, spoken by the ethnographic group known as Tosks.” @atomrab, can you verify that this is indeed the correct language tag, and not (as I suspect) sq
, which the language code for Albanian in general? Full list of affected definitions is below.
http://n2t.net/ark:/99152/p06v8w47jcw
http://n2t.net/ark:/99152/p06v8w48qxz
http://n2t.net/ark:/99152/p06v8w496hs
http://n2t.net/ark:/99152/p06v8w49hzs
http://n2t.net/ark:/99152/p06v8w49mp2
http://n2t.net/ark:/99152/p06v8w4bjgp
http://n2t.net/ark:/99152/p06v8w4bnsk
http://n2t.net/ark:/99152/p06v8w4br75
http://n2t.net/ark:/99152/p06v8w4d6vw
http://n2t.net/ark:/99152/p06v8w4hx4w
http://n2t.net/ark:/99152/p06v8w4jhjm
http://n2t.net/ark:/99152/p06v8w4kbt3
http://n2t.net/ark:/99152/p06v8w4m9zc
http://n2t.net/ark:/99152/p06v8w4mkjq
http://n2t.net/ark:/99152/p06v8w4nfqg
http://n2t.net/ark:/99152/p06v8w4rbc2
http://n2t.net/ark:/99152/p06v8w4s3k7
http://n2t.net/ark:/99152/p06v8w4sm9z
http://n2t.net/ark:/99152/p06v8w4vck4
http://n2t.net/ark:/99152/p06v8w4wpdc
http://n2t.net/ark:/99152/p06v8w4x58t
http://n2t.net/ark:/99152/p06v8w4xtf5
http://n2t.net/ark:/99152/p06v8w4xz8n
http://n2t.net/ark:/99152/p0qhb6623vz
http://n2t.net/ark:/99152/p0qhb662487
http://n2t.net/ark:/99152/p0qhb6626cm
http://n2t.net/ark:/99152/p0qhb662s6j
http://n2t.net/ark:/99152/p0qhb66357d
http://n2t.net/ark:/99152/p0qhb663rsj
http://n2t.net/ark:/99152/p0qhb664h77
http://n2t.net/ark:/99152/p0qhb6658kk
http://n2t.net/ark:/99152/p0qhb6674pv
http://n2t.net/ark:/99152/p0qhb667ddk
http://n2t.net/ark:/99152/p0qhb6687t3
http://n2t.net/ark:/99152/p0qhb6699r7
http://n2t.net/ark:/99152/p0qhb66ckx7
http://n2t.net/ark:/99152/p0qhb66d2kk
http://n2t.net/ark:/99152/p0qhb66dx47
http://n2t.net/ark:/99152/p0qhb66hmw3
http://n2t.net/ark:/99152/p0qhb66ht2h
http://n2t.net/ark:/99152/p0qhb66jzqd
http://n2t.net/ark:/99152/p0qhb66mvv8
http://n2t.net/ark:/99152/p0qhb66n9sq
http://n2t.net/ark:/99152/p0qhb66r4x9
http://n2t.net/ark:/99152/p0qhb66s5m4
http://n2t.net/ark:/99152/p0qhb66sp82
http://n2t.net/ark:/99152/p0qhb66tcgp
http://n2t.net/ark:/99152/p0qhb66tkjt
http://n2t.net/ark:/99152/p0qhb66vfdt
http://n2t.net/ark:/99152/p0qhb66whc9
http://n2t.net/ark:/99152/p0qhb66x9v4
We currently group period concepts into the same set if they share a common published source. We identify the set as a "concept scheme," which means nothing more than "an aggregation of concepts." But there are reasons other than sharing a source that we might want to aggregate period concepts. People using PeriodO to provide a period authority file for their own system may want to aggregate a selection of other people's concepts. This requires being able to aggregate period concepts into a scheme and to assign the scheme a long-term identifier. Even in cases when the users of the authority file are also the authors, we need to distinguish the two aggregations: Pleiades' "currently preferred" period authority file differs from the "authored by Pleiades" aggregation, as the latter may include deprecated period concepts.
So, we need to allow period concepts to be in multiple schemes. This would not be a problem except that currently we connect the bibliographic description of the published source to the concept scheme (under the assumption that the periods in a scheme all share the same source). If we allow concepts to belong to multiple schemes, we need to allow a scheme to contain concepts from different sources. This means we ought to attach the bibliographic description of the published source to the concept as well. This further implies that we need to assign URIs to "our" (i.e. not from Crossref or Worldcat) bibliographic descriptions, since now they may be objects of multiple statements (scheme -> source -> description
and concept -> source -> description
). We could use fragment identifiers for these, e.g. http://n2t.net/ark:/99152/p0fp7wv#source
.
Question: are aggregations other than "same source" aggregations part of the main dataset? Do curators need to accept patches to create new aggregations? Remember that we are giving these stable URIs, which means history tracking, etc. too. So we shouldn't enter that commitment lightly. On the other hand, it does seem that we need a way to create shared, persistently identified schemes, otherwise people will just update a local copy and will have no incentive to keep the canonical dataset up to date.
To recap:
http://n2t.net/ark:/99152/p072r4q has multiple creators entered into a single field:
<http://n2t.net/ark:/99152/p072r4q> dcterms:source [ dcterms:creator [ foaf:name "Alex R. Knodell, Susan E. Alcock, Christopher A. Tuttle, Christian F. Cloke, Tali Erickson-Gini, Ceceilia Feldman, Gary O. Rollefson, Micaela Sinibaldi, Thomas M. Urban, Clive Vella" ] .
Need to check for other cases of this and fix them.
We need to map the terms we are using in our JSON-LD context to CIDOC-CRM terms, and serve this mapping.
Need documentation and examples of recommended ways to use PeriodO URIs.
From periodo/periodo-server#60:
Since you use literal for dcterms:language, better use dc:language (DC Elements not DC Terms). Alternatively if you want to use DCterms, pick an appropriate URL from lingvo.org
Lex caught this in the local collection Sarah made for the CHGIS periods. Some but not all of these periods have double alternate labels in the JSON (these do not appear in the client view). What is going on here, and how do we get rid of them if they're not visible for deletion? Should I have Sarah start over?
These are a a headache for some users, and many of the places that we use them we could mint fragment URIs based on the definition / collection URI.
@rybesh, there are a number of LCSH headings that currently have sub-country spatial coverage descriptions and no spatial coverage. I was planning to go in and associate those with the larger countries, but it occurs to me that perhaps it would be better to wait and use these as test-cases for the bounding-box selection process?
Conversely, I have a number of LCSH "Byzantine Empire" coverage descriptions that correspond to very different imperial extents. Some have no spatial coverage, others have a standard but incomplete set of countries. I was planning to delete the spatial coverage values from the definitions that do have countries, in anticipation of an eventual bounding-box approach that could be calibrated to the extent of the empire in a given period -- or even pointed to a URI for an entity + shapefile for the Byzantine Empire in, say, 1100. Should I go ahead and strip the countries from the ones that have them, add countries to the ones that don't, or just sit back and wait for a new bounding-box alternative that will pull in the boundaries of the current countries?
I've been working with the “spatial entity” picker UI that Bits Coop is building for us, and realizing that we need to do some more thinking about how we want to handle spatial entities in PeriodO. Originally, the idea was that we would use “modern countries.” That seems straightforward, until you realize that there is no agreed-upon list of modern countries. Even seemingly straightforward “countries” like France or Norway are not defined the same in the major gazetteers… And on top of that, we strayed from the “modern countries” idea, and also have some administrative regions within countries, ill-defined historical places, etc. Unfortunately, this means that we need to start maintaining our own place name + place geometry gazetteer, assembled from various different sources. I don't think there is any way around that, but I would like to do some thinking about how we can set a reasonable scope for that: something that is a happy medium between “these are the 195 modern countries that you can choose from, and that's it” and “any place you can imagine, we'll add it.”
Keeping in mind that the purpose of this is not to support sophisticated spatial reasoning but just to show and choose things on a low-resolution map, do either of you have any ideas about a sane way to scope the places we support?
The label for the stop year is there but the structured data is missing.
http://n2t.net/ark:/99152/p0qhb66sqt2
We need some documentation of the data model, including:
Documenting provenance involves indicating which properties have values that are original to the sources, and which have values that are our translations or parsing (so basically, label_en
, the converted quantitative start and end dates, and the spatial_coverage_name
, which we've parsed from spatial_coverage_label
).
Perhaps we need a top-level object in our JSON-LD with various administrative things like rights info (CC0), last modified dates, pointer to previous version, authors, etc... and this could also document which statements are original and which are derived.
We tried very hard to avoid these, but they crept in anyway, so Ryan will need to delete them. Here's the list:
Delete http://n2t.net/ark:/99152/p06c6g3h3wh (duplicate of http://n2t.net/ark:/99152/p06c6g3gfns)
Delete http://n2t.net/ark:/99152/p06c6g35pg5 (duplicate of http://n2t.net/ark:/99152/p06c6g3nnbs, incorrect statement about separate URI for Three Crowns' War -- the links are identical)
Delete http://n2t.net/ark:/99152/p06c6g3z9k9 (duplicate of http://n2t.net/ark:/99152/p06c6g3h)
Delete whichever is more recent of http://n2t.net/ark:/99152/p06c6g3h and http://n2t.net/ark:/99152/p06c6g35vgf
Delete http://n2t.net/ark:/99152/p06c6g3nkxb (duplicate of http://n2t.net/ark:/99152/p06c6g3f3dw)
Delete http://n2t.net/ark:/99152/p06c6g3z9b7 (duplicate of http://n2t.net/ark:/99152/p06c6g3bt2q, though after deletion add alternate Japanese label to the latter)
Delete http://n2t.net/ark:/99152/p06c6g3b46q (duplicate of http://n2t.net/ark:/99152/p06c6g3h2j9, though after deletion add alternate label to the latter)
Delete http://n2t.net/ark:/99152/p06c6g35rqq (duplicate of http://n2t.net/ark:/99152/p06c6g3rhfb, though latter needs to be updated to reflect revision of LCSH entry in 2017 which apparently removed some of earlier variants)
Delete http://n2t.net/ark:/99152/p06c6g3vkbm (duplicate of http://n2t.net/ark:/99152/p06c6g3szt6, though after deletion add alternate label to the latter)
Delete http://n2t.net/ark:/99152/p06c6g3g4sn (duplicate of http://n2t.net/ark:/99152/p06c6g34vjs)
Start at http://datetime.hutime.org/calendar/1001.1/era/2458604.5 and follow the chain back. Need to ask Sekino-san about the difference between the Northern Court and Southern Court, and whether we need to include both or not.
Ryan Baumann suggested that this would make it easier to do sequential processing of periods, but we would need to think hard about the editing interface.
Some entries in the dataset do not currently have a value for spatialCoverageDescription
, which (if I understand correctly) is the spatial coverage explicitly defined within the source (ie not added by the curator)
Since our change system is tied to using JSON Patch, I think it should be JSON-LD. Maybe it would be achievable just through using a JSON-LD Frame, but I've never quite figured out what those are most useful for.
Zero values, and values less than -9999 or greater than 9999, for xsd:gYear
are not well-supported by RDF tools, despite what the spec says. We should change to using xsd:integer
instead, which would (I think) turn our time:DateTimeDescription
s into time:GeneralDateTimeDescription
s.
Currently this is a field that appears in the JSON but is not part of the RDF data model.
An LC subject heading was duplicated again. When you have a chance, please delete http://n2t.net/ark:/99152/p06c6g3tktj (this was from accidental inclusion of a period from a subheading, when the period from the main heading already existed).
The few I checked were from all ARIADNE. These appear to have (textual) spatial coverage descriptions, but no associated gazetteer entities. Full list attached.
We have agreed that the British Museum periodization, for which all the relevant information is in scope notes, will be entered by hand by Sarah using the client interface.
The following partners have not yet contributed their periodizations, all of which should probably be batch-imported from spreadsheets, if that's possible:
Deutsches Archaologisches Institut: we have the Arachne periodization (see in "source_docs" in the PeriodO thesauri dropbox folder), but it appears to lack actual dates or spatial references (though since records have those references, we might be able to ask them for a total dump of records with period terms, locations, and absolute date ranges, and extract those values). Wolfgang said that the Zenon periodization was more specific, but we haven't received it yet.
UCLA Encyclopedia of Egyptology: I missed a window with Willeke, who then went off into the field. I've added the preferred periodization that the DAI uses, which is still on the UEE website, but it may now have been superseded by an updated version. I'm waiting to hear from her to find out.
CLAROS: Sebastian Rahtz was responsive back in the spring, and I talked to him at the CAA, but he was on vacation when I wrote over the summer, and has not responded to an email since. It's not clear to me how CLAROS is using periods, in any case: "period" in the browser seems to mean only "date range", although their CRM mapping suggests they use period terms as well (so maybe they're reconciling them internally?).
I am also planning to contact Nick Croft to see if he's willing to share his RDF-expressed period gazetteer with us.
This may be related to periodo/periodo-client#90. In at least one instance, a call to a Worldcat record with a clear set of authors, when adding a new collection, pulls in the title and the date but not the creators (http://www.worldcat.org/oclc/892462417). This is a problem if we're trying to make it easy to keep track of intellectual genealogies. I haven't tested to see if the problem is specific to this title, or a current problem with Worldcat titles in general.
E.g. splitting Iron Age into Iron Age I, Iron Age II, etc. The first should have skos:narrower
relations to the rest, and the rest should have skos:broader
relations to the first.
Am I correct that it is always assumed that "alternate labels" will be in English?
which means:
label
is how the period the period was defined in the sourcelocalizedLabel
is how the period was translated in the source (?)alternateLabels
are created by the curator (?)I supposed I don't understand the distinction between localizedLabel and alternateLabels except that the latter hardcodes English and allows for multiple values.
We have 48 instances of invalid gYear
values in the canonical dataset. These values are things like 400
(should be 0400
), -271
(should be -0271
), and 0000
(there is no ISO year zero). Probably easiest to fix these as a batch programmatically, but before that we need to prevent it from happening in the first place (see periodo/periodo-client#118).
Out of 1791 period definitions, 1207 have a blank field for spatial coverage description. I'm pretty sure that in the vast majority of cases, these periods have one single entry in spatial coverage (as in, one country). It would probably make sense to copy the text of the country name to the description.
I'm just going to use the issue tracker to keep track of input errors that I need to go back and fix later. This one is for http://n2t.net/ark:/99152/p0wf3wdnm6q, which should be checked but appears to need an alternate name and end in 238 BC, not 476 AD. We could fix some of the alternate labels here, too -- some didn't make it out of the editorial notes.
Need to decide how to represent relationships between period definitions e.g. "derived from".
I noticed that there are a couple of typos in the Fasti period list (English version). I don't want to correct these and lose the originals, since this would cause a mismatch in the URI values. Some of these will have versions in other languages as well (original languages, in most cases). So: should I a) add another column for original English, and correct the typos in a PeriodO label column? b) leave the typos alone for the moment? c) correct the typos in the current label_en column?
We need an explicit formal namespace policy to describe just how persistent the data at each URI is going to be, and what level of edits require a new URI to be generated. See http://dublincore.org/documents/dcmi-namespace/ for an example.
Change @base
from http://n2t.net/ark:/99152/
to http://n2t.net/ark:/99152/p0
so that ids of periods and authorities in JSON-LD are "clean" i.e. don't include information about the ARK shoulder.
Here's a draft:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix periodo: <http://n2t.net/ark:/99152/p0v#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://n2t.net/ark:/99152/p0zmdxzf369>
a skos:Concept ;
periodo:spatialCoverageDescription "Ras Shamra" ;
dc:language "en" ;
dcterms:language <http://lexvo.org/id/iso639-1/en> ;
skos:altLabel "Pre-pottery Neolithic"@en ;
skos:inScheme <http://n2t.net/ark:/99152/p0zmdxz> ;
skos:prefLabel "Pre-pottery Neolithic" ;
time:intervalStartedBy [
skos:prefLabel "Ca. 7500 B.C.E." ;
] ;
time:intervalFinishedBy [
skos:prefLabel "Ca. 7000 B.C.E." ;
] .
:periodannot
a oa:Annotation ;
oa:motivation oa:describing ;
oa:hasTarget <http://n2t.net/ark:/99152/p0zmdxzf369> ;
oa:hasBody [
dcterms:spatial dbpedia:Ugarit ;
time:intervalStartedBy [
time:hasDateTimeDescription [
time:year "-7499"^^xsd:gYear
]
] ;
time:intervalFinishedBy [
time:hasDateTimeDescription [
time:year "-6999"^^xsd:gYear
]
]
] .
dbpedia:Ugarit
skos:prefLabel "Ras Shamra" .
<http://n2t.net/ark:/99152/p0zmdxzf369.ttl>
void:inDataset <http://n2t.net/ark:/99152/p0d> .
For a variety of reasons, it is undesirable to have blank nodes. @rybesh, you pointed out problems relating to:
Error messages in the SHACL validator being unreadable when related to blank nodes
Forming certain sorts of SPARQL queries
Additionally, if we take the approach in #44, it is impossible to refer to blank nodes in an annotation.
We use blank nodes to represent start/stop resources, and for referring to specific pages within sources. We can probably address both those cases by just giving those resources URIs based off the identifier of the associated period.
To map to a CSV output, we need to decide how to handle
Some of our records are missing a spatial coverage value because the lookup list never included them (don't exist in DBpedia, or import failed?). We should add a country value to these before we move over to the bounding-box system, especially since we're going to pull in old values by mapping the DBpedia URIs (right?). They include:
Norway
Cambodia
South Korea (though they have North Korea, for some reason)
Moldova
If these don't exist in DBpedia at all, we should note periods with these spatial coverages and map them to the new set of geometries directly.
I'd like to move our conversation about period data here and out of email, if no one objects. I'll start providing relevant updates and questions as we go.
First relevant update: Fasti period assertions now have URIs, and I've clarified that BP in their system means 2000, not 1950, as it does for C14 or prehistoric work. Please make sure you calculate those dates accordingly.
First question: once I've cleaned the Pleiades period list so that the dates are in separate fields, should I put those into the spreadsheet too, even though we don't have clear geographic coverage? Or should I wait until Tom Elliott and I can come up with some way to represent the coverage according to the locations of the sites where each of those terms is applied?
@rybesh , you mentioned you have used some kind of "circa" predicate before- what was it?
@hcayless pointed out on Twitter that our language tags are out of conformance. There are two issues:
deu
for German, we have to use de
.-latn
from our tags. (In fact I'm not even sure we should have the script in there at all, for any of our language tags.)Currently we define yearPublished
in our JSON-LD context as follows:
"yearPublished": {
"@type": "http://www.w3.org/2001/XMLSchema#gYear",
"@id": "http://purl.org/dc/terms/issued"
}
But when we get values for yearPublished
from Crossref or OCLC, we use their values for the predicated dc:date
and schema:datePublished
. Usually these values are just years, but occasionally they are not (for example http://n2t.net/ark:/99152/p0323gx), resulting in invalid triples.
So, we need to either change our context so that the value type is less narrow (xsd:date
rather than xsd:gYear
) and rename the key accordingly, or we need to strip out months and days from our external sources.
Which do you prefer, @ptgolden?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.