Giter VIP home page Giter VIP logo

oai2lod's Introduction

OAI2LOD Server

The OAI2LOD Server exposes any OAI-PMH compliant metadata repository according as Linked Data. Its architecture, especially the front-end, is based on the D2R Server.

The Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) is a web-based protocol for harvesting metadata in any format from remote metadata repositories that provide an OAI-PMH enabled server. In recent years the protocol has gained much attention in the digital libraries and archives domain and many institutions already provide such a service. Here is a list of registered data providers, among them the Library of Congress OAI Repository, the National Library of Australia, or the Austrian National Libraries Image Archive.

Installation

git clone git://github.com/behas/oai2lod.git
cd oai2lod
ant build

Starting OAI2LOD Server

You need to pass the name of a configuration file to the OAI2LOD server. Example configurations can be found in doc/sample_config.

cp doc/sample_config/loc_config_dc.n3 .
oai2lod-server loc_config_dc.n3

Open http://localhost:2020 in your browser.

Shut down OAI2LOD using CTRL-C or kill the process...

Configuration Guide

The main configuration file specifying the OAI endpoint, the number of records to be harvested etc. is written in N3 syntax. Here is an example, for an OAI2LOD server running on port 2020, exposing metadata from the Austrian National Libraries Image Archive. It is linked with DBPedia and links resources based on specified types and properties in the source and target data sources:

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix oai2lod: <http://www.mediaspaces.info/vocab/oai2lod-server-config.rdf#> .

<> a oai2lod:Server;
	rdfs:label "Example OAI2LOD Server";
	oai2lod:port 2020;
	oai2lod:baseURI <http://localhost:2020/>;
	oai2lod:publishes <oai1>;
	oai2lod:linkedWith <link1>;	
	.
	
<oai1> a oai2lod:OAIServer;
	oai2lod:serverURL <http://oai-bdb.onb.ac.at/Script/oai2.aspx>;
	oai2lod:metadataPrefix "oai_dc";
	oai2lod:styleSheet "xsl/oai_dc2rdf_xml.xsl";
	oai2lod:maxRecords 50;
	.
	
<link1> a oai2lod:LinkedSPARQLEndpoint;
	oai2lod:sparqlService <http://DBpedia.org/sparql>;
	oai2lod:maxResults 5000;
	oai2lod:linkingRule <lrule1>;
	.

<lrule1> a oai2lod:LinkingRule;
	oai2lod:sourceType <http://www.mediaspaces.info/vocab/oai-pmh.rdf#Item>;
	oai2lod:sourceProperty <http://purl.org/dc/elements/1.1/subject>;
	oai2lod:targetType <http://dbpedia.org/class/yago/Capital108518505>;
	oai2lod:targetProperty <http://www.w3.org/2000/01/rdf-schema#label>;
	oai2lod:linkingProperty <http://www.w3.org/2000/01/rdf-schema#seeAlso>;
	oai2lod:similarityMetrics "uk.ac.shef.wit.simmetrics.similaritymetrics.Levenshtein";
	oai2lod:minSimilarity 1.0;
	.	

The first part after the namespace declarations contains the server settings:

  • the server name: rdfs:label "Example OAI2LOD Server";
  • the server port: oai2lod:port 2020;
  • the URL where the server can be reached: oai2lod:baseURI <http://localhost:2020/>; - could also be www.mediaspaces.info:2020 -- do not forget the trailing slash!!!
  • a reference to an OAI-PMH definition, which represents the second part of the server settings: oai2lod:publishes <oai1>;
  • a reference to a (remote) SPARQL endpoint which represents the third part of the server settings: oai2lod:linkedWith <link1>;

The second part defines the OAI-PMH endpoint (NOTE: v.0.2. supports only a single endpoint):

  • the URL of the OAI-PMH server: oai2lod:serverURL <http://memory.loc.gov/cgi-bin/oai2_0>;
  • the metadata format to be harvested, identified by its metadataPrefix: oai2lod:metadataPrefix "oai_dc";
  • the path to the stylesheet for transforming OAI-PMH XML metadata into RDF/XML: oai2lod:styleSheet "xsl/oai_dc2rdf_xml.xsl";
  • the maximum number of records to be harvested: oai2lod:maxRecords 50; -- The more records you harvest, the more memory is required. Currently the OAI2LOD Server is tested with max. 25.000 records, which is already enough for many OAI-endpoints.

The third part defines a SPARQL endpoint this OAI2LOD instance should be linked with (NOTE: v.0.2. supports only a single endpoint):

  • the URL of the SPARQL Service: oai2lod:sparqlService <http://DBpedia.org/sparql>;
  • the maximum number of results requested in a single SPARQL call - or actuall the LIMIT of a query: oai2lod:maxResults 5000;
  • a reference to one or more linking rules: oai2lod:linkingRule <lrule1>;

Link rules tell the OAI2LOD Server the conditions for linking a resource in the OAI2LOD data set with a resource in the remote data set. For each data set one must define source/target types as well as source/target properties. The linking algorithm then compares all values X, which are objects of a certain source property in the source data set, with all values Y, which are objects of a certain target property in the target data set. If they are similar, a link using a given property is created between the resources. For each linking rule, the user can define a minimum similarity threshold (between 0 and 1) and the similarity algorithm to be used. One can choose any algorithm provided by the SimMetrics library. Here is the JavaDoc.

Publications

Further details about OAI2LOD have been published in the following papers:

  • Interweaving OAI-PMH data sources with the linked data cloud

      @article{haslhofer2010interweaving,
        title={Interweaving OAI-PMH data sources with the linked data cloud},
        author={Haslhofer, B. and Schandl, B.},
        journal={International Journal of Metadata, Semantics and Ontologies},
        volume={5},
        number={1},
        pages={17--31},
        year={2010},
        publisher={Inderscience}
      }
    
  • The OAI2LOD Server: Exposing OAI-PMH metadata as linked data

      @inproceedings{haslhofer2008oai2lod,
        title={The OAI2LOD Server: Exposing OAI-PMH metadata as linked data},
        author={Haslhofer, B. and Schandl, B.},
        booktitle={1st International Workshop on Linked Data on the Web (LDOW2008), co-located with WWW 2008},
        year={2008}
      }
    

Open Issues

  • RDF persistence (harvested metadata are currently stored in-memory)
  • periodic updates / scheduler
  • support for deleted/updated records
  • full MODS support

oai2lod's People

Contributors

behas avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.