Giter VIP home page Giter VIP logo

datalake's Introduction

Data Lake Project - Public Transport

Project for CS4225 Big Data Systems for Data Science AY2021/22.

Cloud Architecture

Architecture

Folders in repository

  1. Athena: SQL queries for athena table and queries
  2. EMR: Pyspark scripts
  3. Frontend: ReactJS
  4. Lambda: 2 examples of lambda functions used.
  5. Media: Screenshots and photos related to the project
  6. Raw Data Examples S3: Example of a raw data output for each API

Grafana for Charts in the frontend

We used grafana to generate charts for visualization in the frontend.

grafana-frontend

Frontend

Graphs visualized and 2 layers (taxis and congestions) to toggle on/off.

frontend1

System Monitoring

EMR Metrics exported to Prometheus and visualized in Grafana. 1 master, 2 core (slave) nodes were used.

monitoring

datalake's People

Contributors

ongweisheng avatar sheehui avatar thespacecuber avatar

Watchers

 avatar  avatar

Forkers

ongweisheng

datalake's Issues

ML & Data Transformation

  1. Test out models and algorithms, time series predictions. (basic stuff)

  2. Decide on data transformation (how processed data should look like, as well as if fronend is ok with it).

AWS Setup (Grafana)

  1. Test embedding charts, check how to output to frontend

  2. Test with prometheus for system monitoring

Algo/Logic

  1. Crowdedness: from a point of time of data, how to derive the crowdedness
  2. Prediction: use what algo to predict.
  • What are some of the analysis we want to do? What areas? etc

Spark

  • Spark has an ML library

  • End goal is to get from raw to processed

  • Write out logic

Notes:
Spark 2.4.8 on Hadoop 2.10.1 YARN and Zeppelin 0.10.0

Frontend

  • Input Json format:
{
  "type": "Congestion Layer",
  "features": [
    {
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [103.858333937416, 1.3559533317473, 0.0],
          [103.858215578815, 1.355816304599, 0.0],
          [103.858116866331, 1.35575566979974, 0.0],
          [103.857992826192, 1.35571405765487, 0.0],
          [103.85787572257, 1.35572105501446, 0.0],
          [103.85778107993, 1.35577301170758, 0.0],
          [103.857716551157, 1.35585094776557, 0.0],
          [103.857586091965, 1.35610979081088, 0.0]
        ]
      },
      "type": "Feature",
      "properties": { "Level": 2 }
    },
    {
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [103.857586091965, 1.35610979081088, 0.0],
          [103.857404406124, 1.35688247969145, 0.0],
          [103.857375904766, 1.35727093700206, 0.0],
          [103.857419882646, 1.35766252359057, 0.0],
          [103.857535269587, 1.35832639998177, 0.0]
        ]
      },
      "type": "Feature",
      "properties": { "Level": 2 }
    }
   ]
}
  • Embed dummy Grafana Graph

AWS Setup (EMR)

Test EMR with S3 & Athena with S3

Setup proper flow of data first with a working example

Look into how to avoid re-reading old raw data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.