Giter VIP home page Giter VIP logo

example-datasets's Introduction

Note: this repository is very big (> 1 GB). Cloning it will take a while and require a lot of disk space.

ArangoDB Example graph data

This repository contains datasets organized in graphs to be used with ArangoDB.

More about ArangoDB and graphs: Graph documentation

Fake user data

The "RandomUsers" directory contains files with random users.

{
  "name": {
"first": "Diedre",
"last": "Clinton"
  },
  "gender": "female",
  "birthday": "1959-11-06",
  "contact": {
"address": {
  "street": "2 Fraser Ave",
  "zip": "81223",
  "city": "Cotopaxi",
  "state":"CO"
},
"email": ["[email protected]",
	  "[email protected]",
	  "[email protected]"],
"region": "719",
"phone": ["719-7055896"]
  },
  "likes": ["swimming"],
  "memberSince":"2009-03-14"
}

In order to import these users, use:

arangoimp --file names_XXX.json --collection=users --create-collection=true --type=json

where XXX is 100, 1000, 10000, 100000, 200000, 300000.

Cities

The Cities directory contains a list of cities with geo information. There are roughly 320000 cities.

locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode
1,"O1","","","",0.0000,0.0000,,

In order to import these cities, use

arangoimp --file GeoLiteCity.csv --collection=cities --create-collection=true --type=csv

Countries

The Countries directory contains a list of countries with wikipedia links. There are 241 contries.

"id","code","name","continent","wikipedia_link","keywords"
1,"AD","Andorra","EU","http://en.wikipedia.org/wiki/Andorra",

In order to import these countries, use

arangoimp --file countries.csv --collection=countries --create-collection=true --type=csv

Regions

There are roughly 4100 regions with wikipedia links.

"id","code","local_code","name","continent","iso_country","wikipedia_link","keywords"
1,"AD-02",02,"Canillo","EU","AD","http://en.wikipedia.org/wiki/Canillo",

In order to import these regions, use

arangoimp --file regions.csv --collection=regions --create-collection=true --type=csv

McDonalds

There are roughly 1200 geo coordinates for McDonalds in France.

lat,long
42.524330,2.833970

In order to import these, use

arangoimp --file france.csv --collection=mcdonalds --create-collection=true --type=csv

Bezirke

The Bezirke directory contains a list of German counties with geo information. There are 169431 Bezirke.

RC,UFI,UNI,LAT,LONG,DMS_LAT,DMS_LONG,MGRS,JOG,FC,DSG,PC,CC1,ADM1,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME_RO,FULL_NAME_RO,FULL_NAME_ND_RO,SORT_NAME_RG,FULL_NAME_RG,FULL_NAME_ND_RG,NOTE,MODIFY_DATE
1,6132652,6143433,52.5,13.283333,523000,131700,33UUU8347218037,NN33-10,A,ADM2,,GM,16,,,,N,,Charlottenburg-Wilmersdorf,Bezirk,BEZIRKCHARLOTTENBURGWILMERSDORF,Bezirk Charlottenburg-Wilmersdorf,Bezirk Charlottenburg-Wilmersdorf,CHARLOTTENBURGWILMERSDORF BEZIRK,"Charlottenburg-Wilmersdorf, Bezirk","Charlottenburg-Wilmersdorf, Bezirk",,2001-12-20

In order to import these counties, use

arangoimp --file bezirke.csv --collection=bezirke --create-collection=true --type=csv

Airports

The Airports directory contains a list of airports with geo information. There are roughly 44000 airports.

"id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
6523,"00A","heliport","Total Rf Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,

In order to import these airports, use

arangoimp --file airports.csv --collection=airports --create-collection=true --type=csv

wikiimporter

wikiimporter is a converter for Wikipedia dumps written by Sebastian Cohnen in Ruby, see https://github.com/tisba/wikiimporter. Downloading the wikipedia dump will take some time - it is roughly 2.5 GByte.

cd wikiimporter
mkdir data
mkdir log

sudo bundle install

curl `./bin/getlatestdumpurl.rb` -o data/wiki.xml.bz2 
bzcat data/wiki.xml.bz2 | ./bin/wikixml2json.rb --max-pages 10000 > data/articles.json

arangoimp --file data/articles.json --collection=wiki --create-collection=true --type=json

NerdPursuit

See https://github.com/Nerds/NerdPursuit for details. Each question is stored in its own file. So, you must create a file with all questions first:

./nerd_pursuit_compress.sh

and then import the generated file using

arangoimp --file nerd_pursuit_compressed.json --collection=nerds --create-collection=true --type=json

IP Address Ranges

The IPRanges directory contains IP address ranges and geo information. There are 3.7 Million ranges.

{
  "locId" : "17",
  "endIpNum" : "16777471",
  "startIpNum" : "16777216", 
  "geo" : [ -27, 133 ] 
}

In order to import these locations, use:

arangoimp --file geoblocks.json --collection=ip_ranges --create-collection=true --type=json

The DBLP Computer Science Bibliography

Download the data

./dblp-download.sh

this will create an XML file dblp.xml (roughly 1.1 GByte).

python dblp2json.py dblp.xml > dblp.json

converts the file to json

Graphs Airline Company

The Graphs/AirlineCompany directory contains a subset of the Airports and flight routes of an imaginary airline company among them. Most of the flights are starting from Cologne Airport (CGN).

In order to import this data use

  unix> arangorestore --input-directory "<path-to>/AirlineCompany"

If you want to create a graph for this data use

  unix> arangosh
  arangosh> var Graph = require("org/arangodb/graph").Graph;
  arangosh> new Graph("Airline", "airports", "flights");

Graphs IMDB

The Graphs/IMDB contains a dataset taken form IMDB http://www.imdb.com.

In order to import this data execute the following command:

  unix> arangosh
  arangosh> require("internal").load("Graphs/IMDB/import.js");

This dataset has been used for the screencast of the graph visualisation tool.

Debian Dependency Graph

The debian linux distribution consists of packages, which relate to each others by dependencies, which demand or recommend other packages to be installed. Also conflicts are a possible relation, which prohibits two packages to be installed at once. The script used to gather this graph data is available alongside with pyarango. However, it takes a while to translate the debian package database into arangodb documents, so here is a dump of the Debian Jessie package database.

Since this is a dump of a complete database, you can use arangorestore to import this. We will create an own database debianGraph so it doesn't interfere with your existing data:

unix> arangorestore --input-directory DebianDependencyGraph/ \
    --create-collection true \
    --include-system-collections true \
    --create-database true \
    --server.database debianGraph

Using the ArangoDB graph viewer, we can browse random starting points in the graph: graph screenshot

Game of Thrones

Small, multi-purpose dataset including a small graph of parents and children (Characters --ChildOf--> Characters). See README for details.

Amazon Meta

Amazon product co-purchasing network metadata (summer 2006) with product metadata for over half a million different products from different categories: https://snap.stanford.edu/data/amazon-meta.html

example-datasets's People

Contributors

13abylon avatar dothebart avatar fceller avatar hkernbach avatar joerg84 avatar jsteemann avatar mchacki avatar nerpaula avatar simran-b avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

example-datasets's Issues

ZKD index on IPRanges dataset is very efficient

Hello,

Just for information: with the IPRanges dataset, a ZKD index on [ "startIpNum,", "endIpNum" ] proves to be very efficient. Although it is "only" 2 dimensions, this dataset could probably be flagged as a good example use case for ZKD indexes.

Greetings

Importing graph data in Windows

I have recently been experimenting with ArangoDB for Windows. I am able to import some of the sample data here using arangoimp.exe. I would like to be able to try some sample queries with the graphs you have in this repository, e.g. IMDB. I cannot see how I can import this on the Windows platform. Can anyone help? Many thanks.

Error while loading imdb dataset

Hi,

I tried loading the imdb dataset to try out ArangoSearch. I'm on windows, so I CD'ed to the folder containing import.js, edges.json and nodes.json. From there I started arangosh, switched to my database and ran require("internal").load("import.js"). I found that the graph object returned by gm._create does not have a drop function or an addVertex function. g is indeed a Graph object though and the graph is created in my database. Is this a deprecated API as I'm running 3.4.

Thanks!

Error property '_key' of undefined when importing sample datasets IMDB and marvel

Hi,

I'm using arangodbcli 1.4.5 on osx.
I'va tried to import marvel and IMDB datasets and for both I get this error :

JavaScript exception in file '/org/arangodb/graph' at 268,40: TypeError: Cannot read property '_key' of undefined
! params._from = out_vertex._properties._key;
! ^
stacktrace: TypeError: Cannot read property '_key' of undefined
at Graph._saveEdge (/org/arangodb/graph:268:40)
at Graph.addEdge (/org/arangodb/graph-common:731:15)
at Function.storeVertex ((shell):40:9)
at (shell):62:12
at (shell):64:2

IMDB

i am Student at Technische Hochschule Mittelhessen and at the moment i am trying to use the graph part of ArangoDB. For that reason i've imported your dataset imdb but unlike you mention on github, there is no graph "imdb" created. I only get the two collections imdb_vertices and imdb_edges. I am really new to ArangoDB, so can you tell me please, if there is a possibility to create a new graph from the shell (which?)?

Windows import issues

I could have swore I posted this already. But I don't see it so posting "again" ..

Arango 3.4.1
Windows 10 pro amd64 latest build.

When trying to import IMDB I follow these steps:
1- cd into the IMDB folder.
2- run arangosh
3- require("internal").load("import.js")

There are 2 errors:

First error is that, no matter what, it says the graph already exists.
To fix, I remove this code:

try {
    g = gm._create(gName);
    g.drop();
  } catch (e) {

  }

Second error:

127.0.0.1:8529@_system> require("internal").load("import.js")
2019-01-22T18:12:16Z [2096] ERROR JavaScript exception in file 'import.js' at 32,7: TypeError: g.addVertex is not a function
2019-01-22T18:12:16Z [2096] ERROR !    g.addVertex(d._key, d);
2019-01-22T18:12:16Z [2096] ERROR !      ^
JavaScript exception in file 'import.js' at 32,7: TypeError: g.addVertex is not a function
!    g.addVertex(d._key, d);
!      ^
stacktrace: TypeError: g.addVertex is not a function
    at Function.storeVertex (import.js:32:7)
    at import.js:61:12
    at import.js:63:2
    at <shell command>:1:21

Please help! I'm a linux person, but I'm trying to make this project 100% Windows. (Just saying that because I've seen "just use bash" suggested in other issues...)

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.