Giter VIP home page Giter VIP logo

bods-bus-data-collector's Introduction

BODS Collector

A small tool to help collect data from the UK Government's Bus Open Data Service (BODS). Currently, this tool repeatedly grabs the latest location information from the BODS Location API for all buses of a given operator.

The tool has two modes:

  1. Save each update to a JSON file (e.g. for hosting), overwriting each time.
  2. Save each update to a PostgreSQL database.

Requirements

  • Python 3.6+
  • Docker

Note that the guide below refers to Linux. This should run fine on Windows but I haven't tested it. Some commands such as activating the virtual environment will change a little.

Usage

usage: bus_data_downloader.py [-h] [--db] [--aws]
                              [--aws_filename AWS_FILENAME]
                              [--sleep_interval SLEEP_INTERVAL]
                              operator_code output_path

Tool to collect and publish the latest BODS data for a given operator.

positional arguments:
  operator_code         The BODS operator code to grab.
  output_path           Location to save each update to.

optional arguments:
  -h, --help            show this help message and exit
  --db                  Save each update to a database. (default: False)
  --aws                 Push to S3 Bucket on each update. (default: False)
  --aws_filename AWS_FILENAME
                        Name to push to S3 bucket. (default:
                        current_bus_locations.json)
  --sleep_interval SLEEP_INTERVAL
                        How many seconds to sleep between each pull from the
                        API. (default: 15)

Setup

To use this tool, you will need a BODS API key. To get one, sign up to BODS here.

You will also need to set up credentials.py, plus .env and db.env if you want to use PostgreSQL.

Setting up your virtual environment

As always, it's best to set up a virtual environment. After changing to this repo, run:

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

Note that if you install on MacOS, you may encounter an issue building Psycopg2 - if so, you can install OpenSSL with Homebrew and build as follows:

env LDFLAGS="-I/usr/local/opt/openssl/include -L/usr/local/opt/openssl/lib" pip install psycopg2

Setting up PostgreSQL

You can skip this section if you only want to output JSON.

If you want to use your own, already hosted Postgres database, just fill in credentials.py.tmpl with the username, password, host and port.

If you want to use a Docker-hosted Postgres database, fill in .env.tmpl and db.env.tmpl to make .env and db.env files.

.env:

LOCAL_PORT=the port you want to expose locally for the database
LOCAL_PATH=the path you want to store the data in, or just a name such as pgdata.

db.env:

POSTGRES_USER=the database username (pick what you want!)
POSTGRES_DB=the database name (pick what you want!)
POSTGRES_PASSWORD=the database password (make it good!)

Once you have set these files up, run:

docker-compose up -d

This will set up your database.

Push to AWS

To push to AWS, simply set up your AWS credentials using the AWS CLI tool.

Credentials.py

Use the template credentials.py.tmpl and fill in your BODS API key.

If you are using Postgres, also fill in your database details, making sure that they are in quotes and match the ones defined in the environment files above.

If you want to push to an S3 bucket, then make sure to set your bucket name.

Setting up the Database

Again, skip this if you only want to output JSON.

Next we need to create the table for storing the data. To do this, run:

python3 bus_data_models.py

This will connect to the database and set up the required table.

Running the Tool

You will need to find the operator code for the operator you want to collect data on. You can find these on the Traveline NOC Database.

To run just in JSON mode:

python3 bus_data_downloader.py [OPERATOR CODE] [JSON_PATH]

To run in DB mode too:

python3 bus_data_downloader.py [OPERATOR CODE] [JSON_PATH] --db

To push to AWS:

python3 bus_data_downloader.py [OPERATOR CODE] [JSON_PATH] --aws

bods-bus-data-collector's People

Contributors

tangohead avatar

Watchers

James Cloos avatar  avatar

bods-bus-data-collector's Issues

Add JSON metadata

Add some metadata to the JSON file so it is easier to see which operator this is for, and the time at which the file was generated.

Handle cases where BODS returns invalid data

BODS potentially returns empty or invalid data sometimes, causing a crash when this is converted to JSON. This should be caught and the save/push to S3/to DB should be skipped

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.