Giter VIP home page Giter VIP logo

periodo-data's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

periodo-data's Issues

CIDOC-CRM mapping

We need to map the terms we are using in our JSON-LD context to CIDOC-CRM terms, and serve this mapping.

Identifying aggregations of period concepts

We currently group period concepts into the same set if they share a common published source. We identify the set as a "concept scheme," which means nothing more than "an aggregation of concepts." But there are reasons other than sharing a source that we might want to aggregate period concepts. People using PeriodO to provide a period authority file for their own system may want to aggregate a selection of other people's concepts. This requires being able to aggregate period concepts into a scheme and to assign the scheme a long-term identifier. Even in cases when the users of the authority file are also the authors, we need to distinguish the two aggregations: Pleiades' "currently preferred" period authority file differs from the "authored by Pleiades" aggregation, as the latter may include deprecated period concepts.

So, we need to allow period concepts to be in multiple schemes. This would not be a problem except that currently we connect the bibliographic description of the published source to the concept scheme (under the assumption that the periods in a scheme all share the same source). If we allow concepts to belong to multiple schemes, we need to allow a scheme to contain concepts from different sources. This means we ought to attach the bibliographic description of the published source to the concept as well. This further implies that we need to assign URIs to "our" (i.e. not from Crossref or Worldcat) bibliographic descriptions, since now they may be objects of multiple statements (scheme -> source -> description and concept -> source -> description). We could use fragment identifiers for these, e.g. http://n2t.net/ark:/99152/p0fp7wv#source.

Question: are aggregations other than "same source" aggregations part of the main dataset? Do curators need to accept patches to create new aggregations? Remember that we are giving these stable URIs, which means history tracking, etc. too. So we shouldn't enter that commitment lightly. On the other hand, it does seem that we need a way to create shared, persistently identified schemes, otherwise people will just update a local copy and will have no incentive to keep the canonical dataset up to date.

To recap:

  • allow concepts to be in multiple schemes
  • attach source statements to individual period concepts
  • stop using blank nodes as objects of source statements
  • allow people to add and edit collections?

Multiple creators in a single value field

http://n2t.net/ark:/99152/p072r4q has multiple creators entered into a single field:

<http://n2t.net/ark:/99152/p072r4q> dcterms:source [ dcterms:creator [ foaf:name  "Alex R. Knodell, Susan E. Alcock, Christopher A. Tuttle, Christian F. Cloke, Tali Erickson-Gini, Ceceilia Feldman, Gary O. Rollefson, Micaela Sinibaldi, Thomas M. Urban, Clive Vella" ] .

Need to check for other cases of this and fix them.

Needed: deterministic serialization of graph

Since our change system is tied to using JSON Patch, I think it should be JSON-LD. Maybe it would be achievable just through using a JSON-LD Frame, but I've never quite figured out what those are most useful for.

Most periods are missing spatialCoverageDescription

Out of 1791 period definitions, 1207 have a blank field for spatial coverage description. I'm pretty sure that in the vast majority of cases, these periods have one single entry in spatial coverage (as in, one country). It would probably make sense to copy the text of the country name to the description.

CSV mapping

To map to a CSV output, we need to decide how to handle

  • Spatial coverage (I'm fine with just including the description)
  • Alternate labels
  • Localized labels

Minimize or eliminate use of blank nodes

For a variety of reasons, it is undesirable to have blank nodes. @rybesh, you pointed out problems relating to:

  • Error messages in the SHACL validator being unreadable when related to blank nodes

  • Forming certain sorts of SPARQL queries

Additionally, if we take the approach in #44, it is impossible to refer to blank nodes in an annotation.

We use blank nodes to represent start/stop resources, and for referring to specific pages within sources. We can probably address both those cases by just giving those resources URIs based off the identifier of the associated period.

Chronostratigraphic periods

(via @atomrab)

Would you mind taking a look at the rdf and/or ttl files in this folder: https://utexas.box.com/s/wtzn309lqo1aosp84nylndn0zumft3ro and letting me know if we can ingest the 2014 version programmatically, so that I don't have to add all of these by hand? I feel like this shouldn't be too hard to line up with our model, at least for someone who can actually write scripts, and it would save a tremendous amount of time. The folder also includes half a dozen older versions of the chronostratigraphic chart, which could be really interesting to visualize (but for the moment, I'd settle with having the current version).

In case these aren't already obvious, here are some observations about the rdf and ttl files:

  1. The URIs, which do resolve properly, are in the form http://resource.geosciml.org/classifier/ics/ischart/Aeronian (though they resolve as eg http://vocabs.ands.org.au/repository/api/lda/csiro/international-chronostratigraphic-chart-2016/2016-12-v3/resource.html?uri=http://resource.geosciml.org/classifier/ics/ischart/Aeronian). These URIs, as far as I can tell, appear in the rdf representation but not in the ttl one (??).

  2. The date-range is expressed in rdfs:comment as "older bound-" (="start") and "younger bound-" (="stop"), with a +/- that can be incorporated into four-part dates. All these dates are in Ma (=megayear=one million Julian years=million years ago, usually with "present" as 1950; the date notation doesn't appear in the rdf/ttl, but it does in the pages that the URIs resolve to). So

    <rdfs:comment xml:lang="en">older bound-439 +/-1.8</rdfs:comment><rdfs:comment xml:lang="en">younger bound-436 +/-1.9</rdfs:comment>

    should be parsed as earliestStart:-440798050 (that is, 439ma plus 1.8ma before 1950), latestStart:-437198050.

  3. The alternate languages are expressed with two-character language codes, without script codes, but we could probably identify these manually for the non-Latin scripts (I know the Bulgarian is Cyrillic, but I can't identify the Chinese or Japanese character set off the top of my head).

  4. I think we can use "World" as spatial coverage, at least for a start -- I have a query in with Denné about this.

  5. There are sameAs relations with dbpedia entries here -- should we try to capture those, and if so, how? Although the concepts are the same, the dates are sometimes different (eg http://dbpedia.org/resource/Aptian has 113 +/-1 Ma as the end date, but the corresponding entry in the dataset has 112 +/-1 Ma).

Use xsd:integer rather than xsd:gYear

Zero values, and values less than -9999 or greater than 9999, for xsd:gYear are not well-supported by RDF tools, despite what the spec says. We should change to using xsd:integer instead, which would (I think) turn our time:DateTimeDescriptions into time:GeneralDateTimeDescriptions.

New @base value for dataset

Change @base from http://n2t.net/ark:/99152/ to http://n2t.net/ark:/99152/p0 so that ids of periods and authorities in JSON-LD are "clean" i.e. don't include information about the ARK shoulder.

Additional datasets

We have agreed that the British Museum periodization, for which all the relevant information is in scope notes, will be entered by hand by Sarah using the client interface.

The following partners have not yet contributed their periodizations, all of which should probably be batch-imported from spreadsheets, if that's possible:

Deutsches Archaologisches Institut: we have the Arachne periodization (see in "source_docs" in the PeriodO thesauri dropbox folder), but it appears to lack actual dates or spatial references (though since records have those references, we might be able to ask them for a total dump of records with period terms, locations, and absolute date ranges, and extract those values). Wolfgang said that the Zenon periodization was more specific, but we haven't received it yet.

UCLA Encyclopedia of Egyptology: I missed a window with Willeke, who then went off into the field. I've added the preferred periodization that the DAI uses, which is still on the UEE website, but it may now have been superseded by an updated version. I'm waiting to hear from her to find out.

CLAROS: Sebastian Rahtz was responsive back in the spring, and I talked to him at the CAA, but he was on vacation when I wrote over the summer, and has not responded to an email since. It's not clear to me how CLAROS is using periods, in any case: "period" in the browser seems to mean only "date range", although their CRM mapping suggests they use period terms as well (so maybe they're reconciling them internally?).

I am also planning to contact Nick Croft to see if he's willing to share his RDF-expressed period gazetteer with us.

Language tags violate BCP47

@hcayless pointed out on Twitter that our language tags are out of conformance. There are two issues:

  1. BCP47 decrees that for languages with both an ISO 639-1 (2-letter) tag and an ISO 639-3 (3-letter) tag, the shortest one must be used. So we can't use deu for German, we have to use de.
  2. Although not strictly a part of the spec, BCP47 also discourages using script tags where they are unnecessary. So unless we expect to need to distinguish German sources written in the Fraktur script from those written in ordinary Latin script, we should drop the -latn from our tags. (In fact I'm not even sure we should have the script in there at all, for any of our language tags.)

Originals of deleted duplicate LCSH periods need editing

We tried very hard to avoid these, but they crept in anyway, so Ryan will need to delete them. Here's the list:

Delete http://n2t.net/ark:/99152/p06c6g3h3wh (duplicate of http://n2t.net/ark:/99152/p06c6g3gfns)

Delete http://n2t.net/ark:/99152/p06c6g35pg5 (duplicate of http://n2t.net/ark:/99152/p06c6g3nnbs, incorrect statement about separate URI for Three Crowns' War -- the links are identical)

Delete http://n2t.net/ark:/99152/p06c6g3z9k9 (duplicate of http://n2t.net/ark:/99152/p06c6g3h)

Delete whichever is more recent of http://n2t.net/ark:/99152/p06c6g3h and http://n2t.net/ark:/99152/p06c6g35vgf

Delete http://n2t.net/ark:/99152/p06c6g3nkxb (duplicate of http://n2t.net/ark:/99152/p06c6g3f3dw)

Delete http://n2t.net/ark:/99152/p06c6g3z9b7 (duplicate of http://n2t.net/ark:/99152/p06c6g3bt2q, though after deletion add alternate Japanese label to the latter)

Delete http://n2t.net/ark:/99152/p06c6g3b46q (duplicate of http://n2t.net/ark:/99152/p06c6g3h2j9, though after deletion add alternate label to the latter)

Delete http://n2t.net/ark:/99152/p06c6g35rqq (duplicate of http://n2t.net/ark:/99152/p06c6g3rhfb, though latter needs to be updated to reflect revision of LCSH entry in 2017 which apparently removed some of earlier variants)

Delete http://n2t.net/ark:/99152/p06c6g3vkbm (duplicate of http://n2t.net/ark:/99152/p06c6g3szt6, though after deletion add alternate label to the latter)

Delete http://n2t.net/ark:/99152/p06c6g3g4sn (duplicate of http://n2t.net/ark:/99152/p06c6g34vjs)

19 periods missing structured descriptions of temporal coverage

Missing spatial coverage values from DBpedia (batch correct)

Some of our records are missing a spatial coverage value because the lookup list never included them (don't exist in DBpedia, or import failed?). We should add a country value to these before we move over to the bounding-box system, especially since we're going to pull in old values by mapping the DBpedia URIs (right?). They include:

Norway
Cambodia
South Korea (though they have North Korea, for some reason)
Moldova

If these don't exist in DBpedia at all, we should note periods with these spatial coverages and map them to the new set of geometries directly.

Label typos

I noticed that there are a couple of typos in the Fasti period list (English version). I don't want to correct these and lose the originals, since this would cause a mismatch in the URI values. Some of these will have versions in other languages as well (original languages, in most cases). So: should I a) add another column for original English, and correct the typos in a PeriodO label column? b) leave the typos alone for the moment? c) correct the typos in the current label_en column?

Question: what to do with identical LoC entries associated with different spatial coverage?

For the most part, the LoC period subject headings refer to one and only one country/spatial entity. In the case of periods for the Austro-Hungarian Empire, the LoC has two sets: one for Austria, the other for Hungary. But the periods themselves are otherwise identical. Elsewhere, we have simply added two countries to the coverage. But these periods have separate URIs in the online LoC. Do we simply produce two separate entries, one for Austria and one for Hungary, following the LoC exactly? Or do we make a single entry that maps to two nations and two URIs?

I assume the former, but I just wanted to check.

Representing curatorial descriptions as annotations

Here's a draft:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix periodo: <http://n2t.net/ark:/99152/p0v#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://n2t.net/ark:/99152/p0zmdxzf369>
    a skos:Concept ;
    periodo:spatialCoverageDescription "Ras Shamra" ;
    dc:language "en" ;
    dcterms:language <http://lexvo.org/id/iso639-1/en> ;
    skos:altLabel "Pre-pottery Neolithic"@en ;
    skos:inScheme <http://n2t.net/ark:/99152/p0zmdxz> ;
    skos:prefLabel "Pre-pottery Neolithic" ;
    time:intervalStartedBy [
        skos:prefLabel "Ca. 7500 B.C.E." ;
    ] ;
    time:intervalFinishedBy [
        skos:prefLabel "Ca. 7000 B.C.E." ;
    ] .

:periodannot
    a oa:Annotation ;
    oa:motivation oa:describing ;
    oa:hasTarget <http://n2t.net/ark:/99152/p0zmdxzf369> ;
    oa:hasBody [
        dcterms:spatial dbpedia:Ugarit ;
        time:intervalStartedBy [
            time:hasDateTimeDescription [
                time:year "-7499"^^xsd:gYear
            ]
        ] ;
        time:intervalFinishedBy [
            time:hasDateTimeDescription [
                time:year "-6999"^^xsd:gYear
            ]
        ]
    ] .

dbpedia:Ugarit
    skos:prefLabel "Ras Shamra" .

<http://n2t.net/ark:/99152/p0zmdxzf369.ttl>
    void:inDataset <http://n2t.net/ark:/99152/p0d> .

Question about spatial coverage at transitional moment

@rybesh, there are a number of LCSH headings that currently have sub-country spatial coverage descriptions and no spatial coverage. I was planning to go in and associate those with the larger countries, but it occurs to me that perhaps it would be better to wait and use these as test-cases for the bounding-box selection process?

Conversely, I have a number of LCSH "Byzantine Empire" coverage descriptions that correspond to very different imperial extents. Some have no spatial coverage, others have a standard but incomplete set of countries. I was planning to delete the spatial coverage values from the definitions that do have countries, in anticipation of an eventual bounding-box approach that could be calibrated to the extent of the empire in a given period -- or even pointed to a URI for an entity + shapefile for the Byzantine Empire in, say, 1100. Should I go ahead and strip the countries from the ones that have them, add countries to the ones that don't, or just sit back and wait for a new bounding-box alternative that will pull in the boundaries of the current countries?

Provenance issues

In the provenance graph we currently have, there are statements like this one:

"specializationOf": "http://n2t.net/ark:/99152/p086kj9kr9q",
"wasRevisionOf": 
{
    "id": "http://n2t.net/ark:/99152/p086kj9kr9q?version=0"
}

Couple questions:

  1. Does wasRevisionOf need to be a JSON object? Can it be a simple string, with the mapping to id done in the JSON-LD context?
  2. Should we have wasRevisionOf values for new assertions/collections? wasRevisionOf only makes sense to me when the new version was actually a revision of something that already existed.

Also, we're not currently including type information (i.e. prov:Activity and prov:Entity). Was the on purpose? Are those implied by the relationships between things?

Update to period spreadsheet

I'd like to move our conversation about period data here and out of email, if no one objects. I'll start providing relevant updates and questions as we go.

First relevant update: Fasti period assertions now have URIs, and I've clarified that BP in their system means 2000, not 1950, as it does for C14 or prehistoric work. Please make sure you calculate those dates accordingly.

First question: once I've cleaned the Pleiades period list so that the dates are in separate fields, should I put those into the spreadsheet too, even though we don't have clear geographic coverage? Or should I wait until Tom Elliott and I can come up with some way to represent the coverage according to the locations of the sites where each of those terms is applied?

Places and geometries for spatial coverage

I've been working with the “spatial entity” picker UI that Bits Coop is building for us, and realizing that we need to do some more thinking about how we want to handle spatial entities in PeriodO. Originally, the idea was that we would use “modern countries.” That seems straightforward, until you realize that there is no agreed-upon list of modern countries. Even seemingly straightforward “countries” like France or Norway are not defined the same in the major gazetteers… And on top of that, we strayed from the “modern countries” idea, and also have some administrative regions within countries, ill-defined historical places, etc. Unfortunately, this means that we need to start maintaining our own place name + place geometry gazetteer, assembled from various different sources. I don't think there is any way around that, but I would like to do some thinking about how we can set a reasonable scope for that: something that is a happy medium between “these are the 195 modern countries that you can choose from, and that's it” and “any place you can imagine, we'll add it.”

Keeping in mind that the purpose of this is not to support sophisticated spatial reasoning but just to show and choose things on a low-resolution map, do either of you have any ideas about a sane way to scope the places we support?

Reduplication of labels in JSON

Lex caught this in the local collection Sarah made for the CHGIS periods. Some but not all of these periods have double alternate labels in the JSON (these do not appear in the client view). What is going on here, and how do we get rid of them if they're not visible for deletion? Should I have Sarah start over?

periodo-guide2-1449883467897.zip

Should spatialCoverageDescription be required?

Some entries in the dataset do not currently have a value for spatialCoverageDescription, which (if I understand correctly) is the spatial coverage explicitly defined within the source (ie not added by the curator)

als-latn language tag

A number of the Ariadne and FASTI definitions have the language tag als-latn on their preferred labels. als is the language code for Tosk Albanian, “the southern dialect group of the Albanian language, spoken by the ethnographic group known as Tosks.” @atomrab, can you verify that this is indeed the correct language tag, and not (as I suspect) sq, which the language code for Albanian in general? Full list of affected definitions is below.

http://n2t.net/ark:/99152/p06v8w47jcw
http://n2t.net/ark:/99152/p06v8w48qxz
http://n2t.net/ark:/99152/p06v8w496hs
http://n2t.net/ark:/99152/p06v8w49hzs
http://n2t.net/ark:/99152/p06v8w49mp2
http://n2t.net/ark:/99152/p06v8w4bjgp
http://n2t.net/ark:/99152/p06v8w4bnsk
http://n2t.net/ark:/99152/p06v8w4br75
http://n2t.net/ark:/99152/p06v8w4d6vw
http://n2t.net/ark:/99152/p06v8w4hx4w
http://n2t.net/ark:/99152/p06v8w4jhjm
http://n2t.net/ark:/99152/p06v8w4kbt3
http://n2t.net/ark:/99152/p06v8w4m9zc
http://n2t.net/ark:/99152/p06v8w4mkjq
http://n2t.net/ark:/99152/p06v8w4nfqg
http://n2t.net/ark:/99152/p06v8w4rbc2
http://n2t.net/ark:/99152/p06v8w4s3k7
http://n2t.net/ark:/99152/p06v8w4sm9z
http://n2t.net/ark:/99152/p06v8w4vck4
http://n2t.net/ark:/99152/p06v8w4wpdc
http://n2t.net/ark:/99152/p06v8w4x58t
http://n2t.net/ark:/99152/p06v8w4xtf5
http://n2t.net/ark:/99152/p06v8w4xz8n
http://n2t.net/ark:/99152/p0qhb6623vz
http://n2t.net/ark:/99152/p0qhb662487
http://n2t.net/ark:/99152/p0qhb6626cm
http://n2t.net/ark:/99152/p0qhb662s6j
http://n2t.net/ark:/99152/p0qhb66357d
http://n2t.net/ark:/99152/p0qhb663rsj
http://n2t.net/ark:/99152/p0qhb664h77
http://n2t.net/ark:/99152/p0qhb6658kk
http://n2t.net/ark:/99152/p0qhb6674pv
http://n2t.net/ark:/99152/p0qhb667ddk
http://n2t.net/ark:/99152/p0qhb6687t3
http://n2t.net/ark:/99152/p0qhb6699r7
http://n2t.net/ark:/99152/p0qhb66ckx7
http://n2t.net/ark:/99152/p0qhb66d2kk
http://n2t.net/ark:/99152/p0qhb66dx47
http://n2t.net/ark:/99152/p0qhb66hmw3
http://n2t.net/ark:/99152/p0qhb66ht2h
http://n2t.net/ark:/99152/p0qhb66jzqd
http://n2t.net/ark:/99152/p0qhb66mvv8
http://n2t.net/ark:/99152/p0qhb66n9sq
http://n2t.net/ark:/99152/p0qhb66r4x9
http://n2t.net/ark:/99152/p0qhb66s5m4
http://n2t.net/ark:/99152/p0qhb66sp82
http://n2t.net/ark:/99152/p0qhb66tcgp
http://n2t.net/ark:/99152/p0qhb66tkjt
http://n2t.net/ark:/99152/p0qhb66vfdt
http://n2t.net/ark:/99152/p0qhb66whc9
http://n2t.net/ark:/99152/p0qhb66x9v4

Missing authors / creators / contributors in authority source metadata

This may be related to periodo/periodo-client#90. In at least one instance, a call to a Worldcat record with a clear set of authors, when adding a new collection, pulls in the title and the date but not the creators (http://www.worldcat.org/oclc/892462417). This is a problem if we're trying to make it easy to keep track of intellectual genealogies. I haven't tested to see if the problem is specific to this title, or a current problem with Worldcat titles in general.

ell-latn language tag

A number of the Ariadne definitions have the language tag ell-latn on their preferred labels, but these labels are in Greek script, not Latin. @atomrab, can you verify that these are errors? Full list below.

http://n2t.net/ark:/99152/p0qhb6628nh
http://n2t.net/ark:/99152/p0qhb662trj
http://n2t.net/ark:/99152/p0qhb662z9q
http://n2t.net/ark:/99152/p0qhb6633fn
http://n2t.net/ark:/99152/p0qhb6642m4
http://n2t.net/ark:/99152/p0qhb6643mc
http://n2t.net/ark:/99152/p0qhb66472n
http://n2t.net/ark:/99152/p0qhb664f27
http://n2t.net/ark:/99152/p0qhb664td2
http://n2t.net/ark:/99152/p0qhb665dff
http://n2t.net/ark:/99152/p0qhb665vpv
http://n2t.net/ark:/99152/p0qhb6668db
http://n2t.net/ark:/99152/p0qhb666rpb
http://n2t.net/ark:/99152/p0qhb666zg4
http://n2t.net/ark:/99152/p0qhb667873
http://n2t.net/ark:/99152/p0qhb667vht
http://n2t.net/ark:/99152/p0qhb6695gs
http://n2t.net/ark:/99152/p0qhb669r6t
http://n2t.net/ark:/99152/p0qhb66b989
http://n2t.net/ark:/99152/p0qhb66fv6h
http://n2t.net/ark:/99152/p0qhb66g38m
http://n2t.net/ark:/99152/p0qhb66hcfd
http://n2t.net/ark:/99152/p0qhb66hxr7
http://n2t.net/ark:/99152/p0qhb66jcv6
http://n2t.net/ark:/99152/p0qhb66jh4z
http://n2t.net/ark:/99152/p0qhb66jpqb
http://n2t.net/ark:/99152/p0qhb66jr4s
http://n2t.net/ark:/99152/p0qhb66k3z7
http://n2t.net/ark:/99152/p0qhb66k7q6
http://n2t.net/ark:/99152/p0qhb66kwv9
http://n2t.net/ark:/99152/p0qhb66mj5z
http://n2t.net/ark:/99152/p0qhb66p2j3
http://n2t.net/ark:/99152/p0qhb66p53z
http://n2t.net/ark:/99152/p0qhb66ps4x
http://n2t.net/ark:/99152/p0qhb66ps4x
http://n2t.net/ark:/99152/p0qhb66q6ff
http://n2t.net/ark:/99152/p0qhb66s4gn
http://n2t.net/ark:/99152/p0qhb66s75r
http://n2t.net/ark:/99152/p0qhb66sctf
http://n2t.net/ark:/99152/p0qhb66sm2q
http://n2t.net/ark:/99152/p0qhb66sspc
http://n2t.net/ark:/99152/p0qhb66szh6
http://n2t.net/ark:/99152/p0qhb66tdgx
http://n2t.net/ark:/99152/p0qhb66v39c
http://n2t.net/ark:/99152/p0qhb66v534
http://n2t.net/ark:/99152/p0qhb66vjx3
http://n2t.net/ark:/99152/p0qhb66vqkd
http://n2t.net/ark:/99152/p0qhb66wcdj
http://n2t.net/ark:/99152/p0qhb66wrq3
http://n2t.net/ark:/99152/p0qhb66x582
http://n2t.net/ark:/99152/p0qhb66x8n3
http://n2t.net/ark:/99152/p0qhb66xcnt
http://n2t.net/ark:/99152/p0qhb66xjj8
http://n2t.net/ark:/99152/p0qhb66z499

Documentation of data model

We need some documentation of the data model, including:

  • JSON structure
  • JSON-LD context
  • provenance

Documenting provenance involves indicating which properties have values that are original to the sources, and which have values that are our translations or parsing (so basically, label_en, the converted quantitative start and end dates, and the spatial_coverage_name, which we've parsed from spatial_coverage_label).

Perhaps we need a top-level object in our JSON-LD with various administrative things like rights info (CC0), last modified dates, pointer to previous version, authors, etc... and this could also document which statements are original and which are derived.

Alternate label language for periods

Am I correct that it is always assumed that "alternate labels" will be in English?

which means:

  • label is how the period the period was defined in the source
  • localizedLabelis how the period was translated in the source (?)
  • alternateLabels are created by the curator (?)

I supposed I don't understand the distinction between localizedLabel and alternateLabels except that the latter hardcodes English and allows for multiple values.

Invalid gYear values

We have 48 instances of invalid gYear values in the canonical dataset. These values are things like 400 (should be 0400), -271 (should be -0271), and 0000 (there is no ISO year zero). Probably easiest to fix these as a batch programmatically, but before that we need to prevent it from happening in the first place (see periodo/periodo-client#118).

yearPublished is sometimes not a year

Currently we define yearPublished in our JSON-LD context as follows:

    "yearPublished": {
      "@type": "http://www.w3.org/2001/XMLSchema#gYear",
      "@id": "http://purl.org/dc/terms/issued"
    }

But when we get values for yearPublished from Crossref or OCLC, we use their values for the predicated dc:date and schema:datePublished. Usually these values are just years, but occasionally they are not (for example http://n2t.net/ark:/99152/p0323gx), resulting in invalid triples.

So, we need to either change our context so that the value type is less narrow (xsd:date rather than xsd:gYear) and rename the key accordingly, or we need to strip out months and days from our external sources.

Which do you prefer, @ptgolden?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.