Giter VIP home page Giter VIP logo

wilayah's Introduction

Indonesian Administrative Divisions as Linked Data

Having a common vocabulary for identifying places in Indonesia is essential for synergising development efforts across multiple stakeholders. However, at present, different organizations refer to the same places by different names. Additionally, existing efforts to identify places in Indonesia, such as those identified by GeoNames, are generally incomplete and may not reflect the actual structure of administrative divisions in Indonesia. Thankfully, through the use of Linked Data, it is possible to align these disparate representations using predicates like owl:sameAs.

This repository aims to create a reference for identifying administrative divisions in Indonesia for use in Linked Data applications, such as BenangMerah. BenangMerah uses this data to link places in Indonesia with statistics about the places as well as social projects and organizations active in those places.

The contents of this repository are as follows:

  1. A script to generate RDF triples from reference documents, using node.js.
  2. Reference documents to generate the triples from.
  3. The resulting RDF triples, in Turtle format.

Additionally, a set of URI conventions are used to identify the Indonesian administrative divisions referenced in the triples. They are described in this readme.

Ontology

A custom (i.e., not directly based on any other ontology) OWL ontology (Tbox) is used to describe the concepts needed to describe administrative divisions in Indonesia. OWL classes are used represent the classes of administrative divisions: Provinsi, Kabupaten, Kota, Kecamatan, Distrik, Desa, Kelurahan, etc. OWL object properties are used to denote the parent-child relationships in the hierarchy of administrative divisions.

At the moment, the ontology is available in Turtle format from this repository. However, in the future, the BenangMerah ontology will be split off into a different repo. The URIs used will stay the same.

How it works

The instances RDF graph (Abox) is generated using a custom node.js script from a CSV extracted using Tabula from the PDF of Buku Induk Kode dan Wilayah Administrasi Pemerintahan Per Provinsi, Kabupaten/Kota dan Kecamatan Seluruh Indonesia, which was legalised as Lampiran I Permendagri No. 18/2013, with several mistruncated words corrected based on information on the document itself, as well as abbreviations expanded.

Note that the Permendagri does not include recent establishments of new divisions, such as the province of Kalimantan Utara and many kabupatens around Indonesia. Nonetheless, this knowledgebase uses the Permendagri as its basis. Other possible sources, such as http://kodepos.nomor.net/, will be incorporated in the future.

URIs

URIs are used to identify Linked Data resources, in this case the Indonesian administrative divisions. Each division is referred by 2 URIs, with the equivalence of the URIs asserted using owl:sameAs.

BPS Code URIs (preferred)

The Indonesian government maintains numeric codes for administrative divisions. These numeric codes are reused by other governmental bodies, including their datasets. As such, using these URIs are more preferred for linking government-sourced datasets. The URI pattern is as follows:

http://benangmerah.net/place/idn/bps/[bps-code]

bps-code refers to the BPS code, which is a two-digit (for provinces), four-digit (for kabupaten/kota), or six-digit (for kecamatan) number.

Hierarchical URIs

Since administrative divisions follow a hierarchy, much like files and directories in a filesystem, a similar way of addressing is used. The base URI pattern is as follows:

http://benangmerah.net/place/idn/[provinsi]/[kabupaten-kota]/[kecamatan]

Where:

  • provinsi is the slugified-name of the province, according to the Permendagri. Note that:
    • Daerah Istimewa Yogyakarta, referred as Daista Yogyakarta in the Permendagri and DI Yogyakarta by BPS, is written as di-yogyakarta, not daerah-istimewa-yogyakarta, daista-yogyakarta, yogyakarta, nor diy.
    • DKI Jakarta, on the other hand, is written as dki-jakarta, not daerah-khusus-ibukota-jakarta, jakarta, nor dki.
    • Aceh is written as aceh, as it is its official name according to UU No. 11/2006.
  • kabupaten-kota is the slugified-name of the kabupaten/kota, including the word kabupaten or kota. The abbreviation Kab. in the Permendagri is expanded. Note that the subdivisions of DKI Jakarta are officially termed "Kota Administratif" and "Kabupaten Administratif".
  • kecamatan is the slugified-name of the kecamatan/distrik, not including the word kecamatan nor distrik.

The slugified-name form of place names are generated using the slugify function of underscore.string.

These URI conventions can be compared to other ontologies/resources:

  • GeoNames which uses codes for places, appended to the base GeoNames URI.
  • DBPedia uses Wikipedia titles.

The instances graph

Each resource is rdfs:label-ed by its name according to the Permendagri.

The instances RDF graph is available in Turtle format from this repository.

The script

As a CLI script:

node main.js [-o turtle_output_file_name]

As a module:

var wilayah = require('benangmerah-wilayah');

wilayah.getTripleStore(function(err, tripleStore) {
	// tripleStore is an instance of N3Store containing the triples
});

wilayah.writeTriples('turtle_output_file_name', function(err) {
	if (!err) {
		// Turtle succesfully written
	}
});

About BenangMerah

BenangMerah is an effort to collect data on social development in Indonesia into a knowledge base based on Semantic Web/Linked Data technologies.

BenangMerah is developed by Andhika Nugraha, a student at Institut Teknologi Bandung.

wilayah's People

Contributors

andhikanugraha avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.