Giter VIP home page Giter VIP logo

fulltext-api's Introduction

Full Text API

Europeana Fulltext API

This project consists of 3 modules:

  1. A loader module to read and parse Europeana Newspaper xml files that are the result of the newspaper OCR process. The resulting objects are stored in a Mongo database
  2. An API that
    • reads data from the Mongo database and makes it available via IIIF presentation requests (JSON-LD)
    • allows one to search the fulltext of a particular newspaper issue (record) and return corresponding annotations.
  3. A common module that contains the data model (Fulltext Resources, AnnoPages, Annotations, AnnotationType) for both the loader and API

Implementation details

This Fulltext API implements the functionality described in §3.3 & 3.4 of the Europeana IIIF API Specification.

REQUIREMENTS

  • Java 11 and a Mongo database
  • Optionally also a Solr search engine (for search)

FUNCTIONALITY

API

Start the server and do a request to one of the endpoints:

  • Resource [http://{server:port}/presentation/{dataset_id}/{local_id}/{resource_id}?format={2/3}] (format defaults to 2)

  • Annotation Page [http://{server:port}/presentation/{dataset_id}/{local_id}/annopage/{page_id}?format={2/3}] (format defaults to 2)

  • Annotation [http://{server:port}/presentation/{dataset_id}/{local_id}/anno/{annotation_id}?format={2/3}] (format defaults to 2)

Loader

For batch loading zipped EDM xml files, ensure that the batch.base.directory property is set correctly in the loader.properties file.

The loader can read a single .zip file from that directory by calling the zipbatch endpoint: [http://{server:port}/fulltext/zipbatch?archive={archive.zip}]. Alternatively, it will process all the files in the specified directory by specifying all as archive name, e.g.: [http://{server:port}/fulltext/zipbatch?archive=all]

KNOWN ISSUES

  • the current version does not yet implement usage of an API key

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.