Giter VIP home page Giter VIP logo

harshdeep1996.github.io's Introduction

Libretti Rolandi Entity Extraction

Add description

Contents

The repository is organised as follows:

  • code: contains all the code to extract entities from the coperte and title metadatum and their linking to external/internal sources.

In order to be able to reproduce the results from this folder, the files should be run in numeric order. For instance:

python 01_scrapper.py
python 02_place_extraction.py
python 03_fuzzy_place_extraction.py
python 04_composers_extraction.py
python 05_location_extraction.py
python 06_title_extraction.py
python 07_genre_extraction.py
python 08_occasion_extraction.py
python 09_quick_fixes.py
  • scraper: downloads the manifests of the libretti into the folder manifests

  • place extraction: OCRs the coperte of the libretti and extracts tentative city name, stores csv file with existing metadata and extracted city into the folder data

  • fuzzy place extraction: extracts tentative city name using fuzzy match, stores new csv file into the folder data

  • composers extraction: extracts composer names from copertas and titles, stores new csv file into the folder data

  • location extraction: extracts location of the representation (i.e. name of theater/church/...), stores new csv file into the folder data

  • title extraction: extracts mere title from title metadatum, stores new csv file into the folder data

  • genre extraction: extracts opera genre from title, stores new csv file into the folder data

  • occasion extraction: extracts occasion of representation (i.e. carnival, fair), stores new csv file into the folder data

  • quick fixes: improves composer extraction and wikimedia linking, stores new csv file into the folder data

  • data: contains all the produced csv files in order from oldest to most recent (with librettos_8 being the final version). Furthermore, it contains a ground truth containing the expected and observed entities for 20 random libretti.

Visualization

  • index.html: is the header page which provides a structure of the visualization which is further built upon using the Javascript code.

  • code/scripts: contains all the Python scripts for preprocessing and preparing the data for visualization purposes, for e.g. get all common composer or title links.

  • js/mapIntegration.js: builds the structure by working with the DOM and contains the most of the logic of the visualization, for e.g. mapping theaters, visualizing links or temporally looking at the librettos.

  • css/style.css: contains a single CSS file which provides the styling for the visualization.

To develop the visualization locally

Working and developing on your local machine can be done with the existing code base. Additionally, to counter the Cross Origin Resource Sharing (CORS) issue, one would need to copy the Python script given below and run it in the parent directory; so that the machine hosts the data and one can work locally.

#!/usr/bin/env python3
from http.server import HTTPServer, SimpleHTTPRequestHandler, test
import sys

class CORSRequestHandler (SimpleHTTPRequestHandler):
    def end_headers (self):
        self.send_header('Access-Control-Allow-Origin', '*')
        SimpleHTTPRequestHandler.end_headers(self)

if __name__ == '__main__':
    test(CORSRequestHandler, HTTPServer, port=int(sys.argv[1]) if len(sys.argv) > 1 else 8000)

Authors

  • Harshdeep
  • Aurel Maeder
  • Ludovica Schaerf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.