Giter VIP home page Giter VIP logo

cambridge-cstia-databases's Introduction

Movies Databases

This repo contains code to generate several different databases containing the same data, for the Cambridge CST IA Databases course.

The data is a subset of the IMDb non-commercial datasets. The list of movies is filtered to only include popular movies, or those with useful properties such as duplicate names. The cast and crew data is then filtered down to only include those people relevant to this movies subset. The aim is to output ~1500 movies, to keep the database accessible even on basic hardware.

Three databases are created; SQLite 3, TinyDB and Neo4j, to demonstrate relational, document and graph databases respectively.

Installation

Install Python 3.7 or later, clone this repo and then install the required dependencies:

pip install requests tinydb neo4j

You will also need to install Neo4j, and Java 11 in order to run Neo4j. This code was designed with the Community Server edition in mind, version 4.4 LTS. Examples may not work in the 5.x version of Neo4j.

Usage

The main file is make_databases.py. This script will download the relevant IMDb files, and create the databases.

Run:

python make_databases.py

The SQLite and TinyDB outputs will be created if they do not exist, or emptied and recreated if they do. The script expects a Neo4j database to be already running on localhost with the default port; credentials should be configured in neo4j/neo4j_credentials.json in the form {"username": "neo4j", "password": "neo4j"}. All existing nodes and relations in the database neo4j will be deleted and the movies data loaded; this is the default and only available database in the community server version.

The script will create output/movies.sqlite and output/movies.tinydb.json, as well as loading the data into the neo4j database in the running Neo4j server.

Derived outputs

This script creates the databases themselves. For SQLite and TinyDB, these are conveniently the single-file artefacts needed for someone to create their own version. Some additional artefacts are necessary:

SQL file

It is useful to have a plain SQL file of CREATE and INSERT statements for use with other relational databases. Once you have run the script, connect to the SQLite database with sqlite3 output/movies.sqlite and then dump the output to a file:

.output output/movies.sql
.dump

This file is not plain SQL; it will likely contain an SQLite PRAGMA command at the start. In a text editor, remove this line to produce a vendor-agnostic output.

Neo4j database export

To export the Neo4j database to file, stop the database server and then use the admin command to dump it to a file:

/path/to/neo4j-admin dump --database=neo4j --to=/path/to/output/movies.neo4j.dump

Tutorials

There are tutorials for using the two main databases for the course: relational database tutorial; document database tutorial.

cambridge-cstia-databases's People

Contributors

jsharkey13 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.