Giter VIP home page Giter VIP logo

data's Introduction

Histograph

Historical geocoder, built with Neo4j and Elasticsearch. See histograph.io or the GitHub repositories of one of the components for more information.

Core components

Name GitHub Description
Core histograph/core Consumes Redis queue and calls Graphmalizer
API histograph/api Search API
IO histograph/io Input/output API
Schemas histograph/schemas Ontology and JSON schemas
Config histograph/config Configuration module
Neo4j plugin histograph/neo4j-plugin Server plugin for Neo4j for complex graph queries needed by search API
Viewer histograph/viewer Map viewer, React + Leaflet + D3.js
Data histograph/data Scripts to download and generate Histograph datasets from selection of sources: GeoNames, TGN, BAG, NWB
Import histograph/import Scripts to import data into Histograph API
Stats histograph/stats Runs set of queries on a specified interval to compute data statistics
Website histograph/histograph.github.io Histograph website on histograph.io
Fuzzy dates histograph/fuzzy-dates Parses dates, years, and ranges
URI normalizer histograph/uri-normalizer Converts URIs and Histograph IDs to URNs
Tests histograph/tests Test suite for Histograph API

Data

Name GitHub Description
Concordances histograph/concordances Equivalence relations between places in the Netherlands from different data sources: GeoNames, TGN, BAG and Who's on First

Graphmalizer

Histograph uses Graphmalizer to convert a stream of messages (create/delete/update of PITs and relations) to a graph in Neo4j.

Name GitHub Description
Graphmalizer Core graphmalizer/graphmalizer-core Reads message queue, creates Neo4j graph

Tools

Name GitHub Description
Installation histograph/installation Docs on installing Histograph on your laptop or a server
AWS tools histograph/aws Tools and scripts for deploying Histograph on Amazon Web Services
Quickstart histograph/quickstart Quick start scripts for Histograph
PITs to GeoJSON histograph/pits-to-geojson Convert PIT NDJSON to GeoJSON file
Reasoner histograph/reasoner Finds matching PITs from different datasets and creates links between them (work in progress)
Relationizer histograph/relationizer Web interface to find PITs and easily create relations between them (work in progress)
Stats viewer histograph/stats-viewer Visualizations for JSON results of stats module (work in progress)
Cypher examples histograph/cypher-examples Cypher query examples

The MIT License (MIT)

Copyright (C) 2015 Waag Society.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data's People

Contributors

bertspaan avatar jobspierings avatar wires avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data's Issues

Running without arguments only builds `geonames`

I would expect all of them to build, but it only runs geonames.

histograph@vps662:~/data$ node index.js --config ../api-erfgeo-nl.config.yml
Processing source geonames:
  Executing step download...
    Done!
  Executing step convert...

works when you explicitly specify what to build

histograph@vps662:~/data$ node index.js --config ../api-erfgeo-nl.config.yml tgnProcessing source tgn:  Executing step download...^C

DBpedia opnemen en alignen

  • de vorige versie van DBpedia was nergens aan gemapped en zorgde daarvoor voor dubbele PiTs. Dus een nieuwe import heeft alleen zin als de set gemapped wordt naar geonames of een andere bestaande source.

[ ] uitzoeken wat er in oorspronkelijke set zit (alleen NL?)
[ ] mappen naar andere source, en eventueel aanvullen?

Subsources!

Idea:

Allow subsets per data source, for example bag.streets, bag.buildings, tgn.places, tgn.terms, et cetera. An owner can make as many sub sources, as long as he owns the parent source (or maybe subsources are created implicitly when adding PITs or relations to them). Users can query on parent sources (source=bag or source=bag.*), or on a single subsource (source=bag.streets).

This is a good idea, because:

  • Some data sources consist of different types of data, with different meta data (in data field) per PIT
  • Using subsources, Core itself can add a subsource for inferred relations (e.g. tgn.places.inferred)
  • It's easy to update/delete a subset of your data, without touching the rest

@mmmenno, @phpetra, @wires what do you think?

This should be implemented in API, IO, Core and Viewer as well!

Cshapes & Correlates of War

http://www.correlatesofwar.org/data-sets/folder_listing

country identifiers back to 1816. I think hierarchical and provenance relations can be deduced from this set. Since the Cshapes dataset which we imported earlier has these codes, the cshapes geometries might be extendable backwards in time with these listings. Or at least the PiTs without geometries can be created.

To start with I'm working on parsing this internally, but I thought I'd mention this here already. Also because with all these relations, both temporal/provenance and hierarchical, this might be a nice test case for automatic inferencing

Data tool fails on something (`AssertionError: missing path`)

this is the config (graphmalizer branch)

histograph@vps662:~/config$ node test.js --config ../api-erfgeo-nl.config.yml
{
  "api": {
    "host": "localhost",
    "port": 3000,
    "baseUrl": "https://api.erfgeo.nl/",
    "dataDir": "/home/histograph/uploads",
    "admin": {
      "name": "histograph",
      "password": "***********"
    }
  },
  "redis": {
    "host": "localhost",
    "port": 6379,
    "queue": "histograph"
  },
  "elasticsearch": {
    "host": "localhost",
    "port": 9200
  },
  "neo4j": {
    "host": "localhost",
    "port": 7474
  },
  "core": {
    "batchSize": 100,
    "batchTimeout": 250
  },
  "viewer": {
    "language": "en"
  },
  "data": {
    "geonames": {
      "countries": [
        "NL"
      ],
      "extraUris": "./extra-uris.json"
    },
    "tgn": {
      "parents": [
        "tgn:7016845"
      ]
    },
    "bag": {
      "db": {
        "host": "localhost",
        "port": 5432,
        "user": "postgres",
        "password": "postgres",
        "database": "bag"
      }
    }
  },
  "import": {
    "dirs": [
      "/home/histograph/data"
    ]
  },
  "logo": [
    "   ●───────●    ",
    "  /║       ║\\  ",
    " / ║       ║ \\ ",
    "●  ║═══════║  ● ",
    " \\ ║       ║ / ",
    "  \\║       ║/  ",
    "   ●───────●    "
  ]
}

this is the error (graphmalizer branch)

histograph@vps662:~/data$ node index.js --config ../api-erfgeo-nl.config.yml
Processing source geonames:
  Executing step download...
    Done!
  Executing step convert...

assert.js:86
  throw new assert.AssertionError({
        ^
AssertionError: missing path
    at Module.require (module.js:363:3)
    at require (module.js:384:17)
    at Object.<anonymous> (/home/histograph/data/pits-and-relations.js:3:14)
    at Module._compile (module.js:460:26)
    at Object.Module._extensions..js (module.js:478:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
    at Module.require (module.js:365:17)
    at require (module.js:384:17)
    at Object.exports.convert (/home/histograph/data/geonames/geonames.js:223:22)

kloeke codes opnemen

[ ] relevante user story voor kloeke codes?
[ ] hebben we de meest actuele set?
[ ] rechten?

amco codes opnemen

[ ] zijn die rechtenvrij?
[ ] hebben we een recente versie?
[ ] moet die nog gemapped?

example of config for data source?

I've successfully installed histograph core and am trying to run data import scripts but get the error:

$node index.js geonames

/home/dev/build/data/geonames/geonames.js:49
    var countryFilenames = config.countries.map(function(country) {
                                           ^
    TypeError: Cannot read property 'countries' of undefined

. README says to configure data sources in histograph configuration file. Do I follow correctly that geonames importer is expecting a countries mapping in the histograph configuration file? If so, can someone give an example of such a config?

Create content for repository data-bag

With npm -v 3.3.12 and node -v 5.3.0 npm install fails with

npm ERR! Darwin 15.2.0
npm ERR! argv "/usr/local/Cellar/node/5.3.0/bin/node" "/usr/local/bin/npm" "install"
npm ERR! node v5.3.0
npm ERR! npm  v3.3.12
npm ERR! code ELIFECYCLE

npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1

Upgrading to the latest release fixes the problem for me, but might break with other versions of node/npm?

Stefanos-MacBook-Pro:data SB$ git diff
diff --git a/package.json b/package.json
index afb2766..62b1d46 100644
--- a/package.json
+++ b/package.json
@@ -21,7 +21,7 @@
     "geojsonhint": "^1.0.0",
     "highland": "^2.5.1",
     "is-my-json-valid": "^2.10.1",
-    "level": "^0.19.1",
+    "level": "^1.4.0",
     "minimist": "^1.1.1",
     "ndjson": "^1.3.0",
     "pg": "^4.3.0",

Friese plaknamen?

Set die door Gerard Kuys eerder tijdens de Pilot-fase gemaakt is.

[ ] status van deze set?
[ ] moeite waard? (vanwege meertaligheid?)

Branch data-modules isn't converting any data

Unclear on how this should work, but at the moment it isn't. Switched to histograph-config data-modules branch, with a freshly pulled data-geonames:

data:                                       # Data module options (http://github.com/histograph/data)
  modulePrefix: data-
  baseDir: /home/reinv/data/Github/histograph
  generatedDir: /home/reinv/data/Github/histograph/generated-data
  modules:
    geonames:
      countries:
        - NL
      extraUris: ./extra-uris.json

But on:
reinv@lingui:~/data/Github/histograph/data$ node index.js --all

Using data modules in /home/reinv/data/Github/histograph/data-*
  Saving data to /home/reinv/data/Github/histograph/generated-data

No data modules found...

Usage: node index.js [--all] [--steps [step1,step2,...]] [--config /path/to/config.yml] [module ...]

Missing dependency 'adm-zip' on executing node index.js nwb

reinv@lingui:~/data/Github/histograph/data$ node index.js nwb
module.js:328
throw err;
^

Error: Cannot find module 'adm-zip'
at Function.Module._resolveFilename (module.js:326:15)
at Function.Module._load (module.js:277:25)
at Module.require (module.js:354:17)
at require (internal/module.js:12:17)
at Object. (/var/data/Github/histograph/data/nwb/nwb.js:6:14)
at Module._compile (module.js:398:26)
at Object.Module._extensions..js (module.js:405:10)
at Module.load (module.js:344:32)
at Function.Module._load (module.js:301:12)
at Module.require (module.js:354:17)
reinv@lingui:~/data/Github/histograph/data$ npm install adm-zip
[email protected] /var/data/Github/histograph/data
└── [email protected] extraneous

npm WARN EPACKAGEJSON [email protected] No repository field.
reinv@lingui:~/data/Github/histograph/data$ node index.js nwb
Processing dataset nwb:
Executing step download...
Done!
Executing step convert...
Done!
Executing step done...
Done!

All datasets done!
Done!

All datasets done!

Failing to build `geonames`

node index.js geonames
Processing source geonames:
  Executing step download...
    Done!
  Executing step convert...
    Error: [{"field":"data","message":"no (or more than one) schemas match"}]
Error: [object Object]

All sources done!

?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.