Giter VIP home page Giter VIP logo

wikidata-jskos's Introduction

Wikidata JSKOS

GitHub release API Status License Docker Test npm version standard-readme compliant

Access Wikidata in JSKOS format

This node module provides a web service, a command line client, and a library to access Wikidata in JSKOS format. The data includes Wikidata items as concepts and concept schemes (read) and mappings between Wikidata and other authority files (read and write).

Table of Contents

Background

Wikidata is a large knowledge base with detailed information about all kinds of entities. Mapping its data model to JSKOS data format allows simplified reuse of Wikidata as authority file. This implementation is used in the Cocoda web application but it can also be used independently.

The mapping between Wikidata and JSKOS format includes:

In addition a search service is provided for selecting a Wikidata item with typeahead.

Editing Wikidata mapping statements to other authority files requires authentification via OAuth. The following authority files have been tested succesfully:

  • Basisklassifikation (BK)
  • Regensburg Classification (RVK)
  • Integrated Authority File (GND)
  • Nomisma
  • Iconclass

Other systems (not including DDC) may also work but they have not been converted to JSKOS yet, so they are not provided for browsing in Cocoda.

Install

Docker

The easiest way to run wikidata-jskos is via Docker. Please refer to the Docker documentation.

Node.js

Node.js 18 is required (Node.js 20 recommended).

git clone https://github.com/gbv/wikidata-jskos.git
cd wikidata-jskos
npm ci

Optionally make the command line tool wdjskos available:

npm link

Configuration

You can customize the application settings via a configuration file, e.g. by providing a generic config.json file and/or a more specific config.{env}.json file (where {env} is the environment like development or production). The latter will have precendent over the former, and all missing keys will be defaulted with values from config.default.json.

All configuration options can also be set via environment variables (.env when running via Node.js or using environment or env_file in Docker Compose).

Some notes:

  • To use a custom Wikibase instance, you can set the subkeys of the wikibase property. Both instance and sparqlEnpoint are necessary. By default, Wikidata is used.
  • wikidata-jskos supports saving, editing, and deleting mappings in Wikidata. To enable this, you will need to provide auth.algorithm and auth.key (algorithm and key to decode the JWT, usually coming from login-server), as well as oauth.consumer_key and oauth.consumer_secret (for your registered OAuth consumer).
  • auth.key/AUTH_KEY contain line breaks. In JSON, these can simply be set as \n. When using .env or env_file, the whole key needs to be double-quoted ("-----BEGIN PUBLIC KEY-----\n..."). To set AUTH_KEY directly in docker-compose.yml via environment, please look at the included docker-compose.yml file or refer to this StackOverflow answer.
  • Please provide a baseUrl when used in production. If no baseUrl is provided, http://localhost:${port}/ will be used.
  • List of all available configuration options:
config.json key environment variable default value
title TITLE Wikidata JSKOS Service
wikibase.instance WIKIBASE_INSTANCE https://www.wikidata.org
wikibase.sparqlEndpoint WIKIBASE_SPARQL https://query.wikidata.org/sparql
wikibase.api WIKIBASE_API ${wikibase.instance}/w/api.php
verbosity VERBOSITY false
baseUrl BASE_URL http://localhost:${port}/
port PORT 2013
auth.algorithm AUTH_ALGORITHM HS256
auth.key AUTH_KEY null
oauth.consumer_key OAUTH_KEY null
oauth.consumer_secret OAUTH_SECRET null

Usage

Run Server

For development serve with hot reload and auto reconnect at http://localhost:2013/:

npm run start

Deployment

For deployment there is a config file to use with pm2:

cp ecosystem.example.json ecosystem.config.json
pm2 start ecosystem.config.json

To update concept schemes, regularly run:

npm run update

Web Service

An instance is available at https://coli-conc.gbv.de/services/wikidata/. The service provides selected endpoints of JSKOS API.

Authentication

The following endpoints require an authenticated user:

Authentication works via a JWT (JSON Web Token). The JWT has to be provided as a Bearer token in the authentication header, e.g. Authentication: Bearer <token>. It is integrated with login-server and the JWT is required to have the same format as the one login-server provides. Specifically, the OAuth token and secret for the user need to be provided as follows:

{
  "user": {
    "identities": {
      "wikidata": {
        "oauth": {
          "token": "..",
          "token_secret": "..."
        }
      }
    }
  }
}

There are more properties in the JWT, but those are not used by wikidata-jskos. Note that the JWT needs to be signed with the respective private key for the public key provided in the configuration. Also, the OAuth user token and secret need to come from the same OAuth consumer provided in the config.

GET /status

Returns a JSKOS API status object. See JSKOS Server for details.

GET /concepts

Look up Wikidata items as JSKOS Concepts by their entity URI or QID.

  • URL Params

    uri=[uri] URIs for concepts separated by |.

    language or languages: comma separated list of language codes.

  • Success Response

    JSON array of JSKOS Concepts

Only some Wikidata properties are mapped to JSKOS fields. The result also contains broader links determined by an additional SPARQL request.

Deprected alias at /concept is going to be removed.

GET /concepts/suggest

OpenSearch Suggest endpoint for typeahead search.

Deprected aliases at /concept/suggest and /suggest are going to be removed.

GET /mappings

Look up Wikidata mapping statements as JSKOS Concept Mappings between Wikidata items (query parameter from) and external identifiers (query parameter to). At least one of both parameters must be given.

  • URL Params

    from=[uriOrNotation1|uriOrNotation2|...] specify the source URI or notation (multiple URIs/notations separated by |)

    to=[uriOrNotation1|uriOrNotation2|...] specify the target URI or notation (multiple URIs/notations separated by |)

    fromScheme=[uri|notation] only show mappings from this concept scheme (URI or notation)

    toScheme=[uri|notation] only show mappings to this concept scheme (URI or notation)

    language or languages enables inclusion of entity labels. A comma separated list of language codes is used as preference list.

    mode=[mode] specify the mode for from, to, one of and (default) and or

    direction=forward|backward|both searches mappings from from to to (default), reverse, or in both directions

    limit=[number] maximum number of mappings to return (not fully implemented)

    offset=[number] start number of mappings to return (not fully implemented)

Concept Schemes are identified by BARTOC IDs (e.g. http://bartoc.org/en/node/430`).

  • Success Response

    JSON array of JSKOS Concept Mappings

  • Examples

    ?from=http://www.wikidata.org/entity/Q42

    ?to=http://d-nb.info/gnd/119033364

Mapping relation types (P4390) are respected, if given, see for example mapping from Wikidata to http://d-nb.info/gnd/7527800-5.

GET /mappings/voc

Look up Wikidata items with Wikidata properties for authority control as JSKOS Concept Schemes with used for mappings. These schemes need to have a BARTOC-ID (P2689), and be subject item (P1629) of an external identifier property with statements P1921 (URI template) and P1793 (regular expression).

GET /mappings/:_id

Returns a specific mapping for a Wikidata claim/statement.

  • Success Response

    JSKOS object for mapping.

  • Error Response

    If no claim with _id could be found, it will return a 404 not found error.

  • Sample Call

    curl https://coli-conc.gbv.de/services/wikidata/mappings/Q11351-9968E140-6CA7-448D-BF0C-D8ED5A9F4598
    {
      "uri": "http://localhost:2013/mappings/Q11351-9968E140-6CA7-448D-BF0C-D8ED5A9F4598",
      "identifier": [
        "http://www.wikidata.org/entity/statement/Q11351-9968E140-6CA7-448D-BF0C-D8ED5A9F4598",
        "urn:jskos:mapping:content:2807c55eac85ed8c0c9254ff04b457f89b801ac9",
        "urn:jskos:mapping:members:daafcd8580e6f0304f0b1cee024f65f04da98a3c"
      ],
      "to": {
        "memberSet": [
          {
            "uri": "http://rvk.uni-regensburg.de/nt/VK",
            "notation": [
              "VK"
            ]
          }
        ]
      },
      "type": [
        "http://www.w3.org/2004/02/skos/core#exactMatch"
      ],
      "fromScheme": {
        "uri": "http://bartoc.org/en/node/1940",
        "notation": [
          "WD"
        ]
      },
      "toScheme": {
        "uri": "http://bartoc.org/en/node/533",
        "notation": [
          "RVK"
        ]
      },
      "from": {
        "memberSet": [
          {
            "uri": "http://www.wikidata.org/entity/Q11351",
            "notation": [
              "Q11351"
            ]
          }
        ]
      },
      "@context": "https://gbv.github.io/jskos/context.json"
    }

POST /mappings

Saves a mapping in Wikidata. Requires authentication.

Note that if an existing mapping in Wikidata is found with the exact same members, that mapping will be overwritten by this request.

  • Success Reponse

    JSKOS Mapping object as it was saved in Wikidata.

PUT /mappings/:_id

Overwrites a mapping in Wikidata. Requires authentication.

  • Success Reponse

    JSKOS Mapping object as it was saved in Wikidata.

DELETE /mappings/:_id

Deletes a mapping from Wikidata. Requires authentication.

  • Success Reponse

    Status 204, no content.

Command line tool

The command line client wdjskos provides roughly the same commands as accessible via the web service.

Mapping schemes are cached in the subfolder ./cache. To update the cache include option --force or run command update.

wdjskos concept

Look up Wikidata items as JSKOS Concepts.

wdjskos concept Q42

wdjskos mappings

Look up JSKOS Concept Mappings.

wdjskos mappings Q42 | jq .to.memberSet[].uri
wdjskos mappings - http://viaf.org/viaf/113230702

A single hyphen (-) can be used to nullify argument from or to, respectively. Mappings can be limited to a target scheme. These are equivalent:

wdjskos --scheme P227 mappings Q42
wdjskos --scheme 430 mappings Q42
wdjskos --scheme http://bartoc.org/en/node/430 mappings Q42

wdjskos schemes

Return up JSKOS Concept Schemes with Wikidata properties for authority control.

wdjskos update

Look up concept schemes from Wikidata and update the cache.

wdjskos find

Search a Wikidata item by its names and return OpenSearch Suggestions response.

wdjskos mapping-item

Convert a JSKOS mapping to a Wikidata item.

wdjskos mapping-item mapping.json
wdjskos --simplfiy mapping-item mapping.json

API

The node library can be used to convert Wikidata JSON format to JSKOS (mapEntity) and to convert JSKOS mappings to Wikidata JSON format (mapMapping).

mapEntity

jskos = wds.mapEntity(entity)

Entity data can be retrieved via Wikidata API method wbgetentities and from Wikidata database dumps. See JavaScript libraries wikidata-sdk and wikidata-filter for easy access and processing.

Map selected parts of a Wikidata entity

All methods return a JSKOS item.

jskos = wds.mapIdentifier(entity.id)
// { uri: "http://www.wikidata.org/entity/...", notation: [ "..." ] }

jskos = wds.mapLabels(entity.labels)
// { prefLabel: { ... } }

jskos = wds.mapAliases(entity.aliases)
// { altLabel: { ... } }

jskos = wds.mapDescriptions(entity.descriptions)
// { scopeNote: { ... } }

jskos = wds.mapSitelinks(entity.sitelinks)
// { occurrences: [ { ... } ], subjectOf: [ { url: ... }, ... ] }

jskos = wds.mapClaims(entity.claims)
// ...

// convert claims with mapping properties
jskos = wds.mapMappingClaims(claims)

jskos = wds.mapInfo(entity)
// ...

Map simplified Wikidata entities

Each method has a counterpart to map simplified Wikidata entities.

jskos = wds.mapSimpleEntity(entity)
jskos = wds.mapSimpleIdentifier(entity.id)
jskos = wds.mapSimpleLabels(entity.labels)
...

mapMapping

Convert a JSKOS mapping into a Wikidata claim. Only respects JSKOS fields from, to, uri, and type (if given) and only supports 1-to-1 mappings from a single Wikidata item to a concept in another concept scheme.

this is work in progress!

Contributing

PRs accepted against the dev branch.

Small note: If editing the README, please conform to the standard-readme specification.

Publish

For maintainers only

Never work on the main branch directly. Always make changes on dev and then run the release script:

npm run release:patch # or minor or major

License

MIT ยฉ 2024 Verbundzentrale des GBV (VZG)

wikidata-jskos's People

Contributors

dependabot[bot] avatar nichtich avatar stefandesu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

wikidata-jskos's Issues

Fix use of mappings from Wikidata

This requires #136 and #46 which includes heavier cleanup at several places. Try

In the Mapping Browser

Include multiple language labels on narrower concepts

By now narrower concepts include labels in one language e.g.

http://coli-conc.gbv.de/services/wikidata/data/?uri=http:%2F%2Fwww.wikidata.org%2Fentity%2FQ223143&language=de,en,es,nl,it,pl,ru,cs,jp

queries via SPARQL

SELECT DISTINCT ?from ?entity ?entityLabel WHERE {
  VALUES ?from { <http://www.wikidata.org/entity/Q223143> }
  ?entity wdt:P361 | wdt:P31 | wdt:P279 | wdt:P131 <http://www.wikidata.org/entity/Q223143>
  SERVICE wikibase:label { 
    bd:serviceParam wikibase:language "de,en,es,nl,it,pl,ru,cs,jp".
  }
}

The SPARQL label service only return one label because this is a quick lookup. To query all languages probably takes more time but this may be neglectable:

SELECT DISTINCT ?from ?entity ?label WHERE {
  VALUES ?from { <http://www.wikidata.org/entity/Q223143> }
  ?entity wdt:P361 | wdt:P31 | wdt:P279 | wdt:P131 <http://www.wikidata.org/entity/Q223143> .
  ?entity rdfs:label ?label .
  FILTER(REGEX(LANG(?label),"^(de|en|es|nl|it|pl|ru|cs|jp)$"))
}

Add /mappings?fromScheme&toScheme lookup

To get a list of all mappings from Wikidata to a given concept scheme. Result would probably be limited to a maximum number, e.g. 10.000. fromScheme would be optional, set to http://bartoc.org/en/node/1940 (Wikidata) by default.

Sample query: https://w.wiki/7FY

SELECT ?item ?itemLabel ?value ?mappingType WHERE {
  ?item p:P5748 ?statement .
  ?statement ps:P5748 ?value .
  OPTIONAL { ?statement pq:P4390 ?mappingType }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} LIMIT 10000

pagination (LIMIT and OFFSET) requires a sorting, e.g. ORDER BY ?value, this may be too slow for very large concordances such as GND.

Avoid caching of X-Total-Count

The SPARQL query to get the total number of mappings is cached so adding a mapping does not directly change its value. Maybe enforce cache invalidation if a mapping was saved via Cocoda?

Remove note

This note to STDOUT breaks (nd)json output:

Note: To allow saving/removing mappings, authentication has to be configured (see documentation).

mapping URI missing in mapping-list

Missing URIs: https://coli-conc.gbv.de/services/wikidata/mappings?toScheme=http:%2F%2Ficonclass.org%2Frdf%2F2011%2F09%2F&limit=5 as returned for individual mappings, e.g. https://coli-conc.gbv.de/services/wikidata/mappings?toScheme=http:%2F%2Ficonclass.org%2Frdf%2F2011%2F09%2F&from=http://www.wikidata.org/entity/Q128115

At https://github.com/gbv/wikidata-jskos/blob/master/lib/queries/get-mapping-list.js#L55 the mapping URI and identifiers should be added like at

uri: `${config.baseUrl}mappings/` + claim.id.replace("$", "-"),
identifier: ["http://www.wikidata.org/entity/statement/" + claim.id.replace("$", "-")],

Move static fields from config file

The README says:

Changes to concepts and mappings will not change actual functionality. These keys are only provided so that clients know what kind of functionality is available.

If these fields cannot be used for configuration, they should better be moved from the config file.

Add pagination

Especially relevant to mappings. Return link headers like the GitHub API and jskos-server do (https://developer.github.com/v3/guides/traversing-with-pagination/), but use URL parameters limit and offset (instead of GitHub's page) for navigation.

Example:
https://coli-conc.gbv.de/services/wikidata/mappings/?from=http:%2F%2Fdewey.info%2Fclass%2F612.11%2Fe23%2F&to=http:%2F%2Fwww.wikidata.org%2Fentity%2FQ7873&direction=both&mode=or&limit=2 should return only 2 results, and the other two when appending &offset=2.

Suggest: Prepend notations to result labels

Both jskos-server and DANTE prepend notations to the result labels for the suggest service. To have consistency between the systems, we should do the same here as well. DANTE uses an additional parameter to toggle this behavior (use=notation,label searches both notations and labels and also prepends the notations to the results), but jskos-server always prepends notations.

Possible bug in mappings-to

I created a mapping from WD Q7884093 to BK 49 with only the BK concept selected on the right side. Cocoda said "Mapping saved", but no mapping showed up in the Navigator. Looking at the network requests, I found that the request returns an empty array:

https://coli-conc.gbv.de/services/wikidata/mappings?to=http:%2F%2Furi.gbv.de%2Fterminology%2Fbk%2F49&direction=both&mode=or

On the other hand, if you select the WD concept, the mapping is returned:

https://coli-conc.gbv.de/services/wikidata/mappings?from=http:%2F%2Fwww.wikidata.org%2Fentity%2FQ7884093&direction=both&mode=or

It would be good if you could take a look at that, @nichtich. Thanks!

Avoid caching of deleted mappings

Deleting a mapping will not directly update the SPARQL results that may still include the deleted mapping. Maybe hold a list of deleted mapping URIs and filter out from the result set?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.