Giter VIP home page Giter VIP logo

ultimate-ai's Introduction

Ultimate-ai challenge

The objective is to process twitter's live tweet stream, process, accumulate every 20 seconds and merge them with total coronavirus cases from worldometer.info. The processed micro batch of tweets with coronavirus case information will be used by the data scientists for predicting the number of potential customers.

Assumptions

Sources may vary, but the sink(Mongodb) is same for storing the processed real-time events.

Events in Mongodb

EVENTS

Implementation

Design

Implemented based on factory method that provides an interface StreamProcessor with concrete methods for a stream processor but can be altered when adding new sources like TwitterStreamProcessor

Methods to implement on a new source,

process - pre-processing/data processing on a single event

Subclasses can alter the objects returned by the following factory methods,

process_micro_batch write_stream write

Cache Implementation

This application holds a simple cache(as a key-value store)to lookup the corona case count if the "https://www.worldometers.info/coronavirus/" is not reachable or timed-out. The initial value is set to -1 and on each successful request, the cache will be updated.

  1. Whenever there is a request exception(assuming short hiccups), the corona case count will be retrieved from the cache.

  2. "https://www.worldometers.info/coronavirus/" is external and we don't have control on changes. So we if we could not parse the html, we set the corona_case_count as -1 which indicates(to be monitored) that we have to adjust our application without stopping the processing of events and downtime.

Another distributed and robust solution could be to use a external key-value store, for ex: Memcached, without increasing the complexity of the architecture

Running in Production

In Spark, there can be only spark session per JVM, so the application is designed to run as a different job for different sources,

spark-submit --master yarn --deploy-mode cluster  --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 main.py --jobname ultimate_ai_socket_stream_processing --source socket
spark-submit --master yarn --deploy-mode cluster  --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 main.py --jobname ultimate_ai_kafka_stream_processing --source kafka

This also helps in better failure management, operations and monitoring than the tightly coupled applications.

Steps to run

docker-compose up --build -d
docker-compose down --remove-orphans

ultimate-ai's People

Contributors

prasanna-ds avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.