Giter VIP home page Giter VIP logo

kafka-covid-stream's Introduction

Parse data using Kafka stream with Yarn

This is a distributed streaming application that reads covid19 data using Python, and pipes it to Kafka streams. Jobs are run using Yarn, Samza and Zookeeper on Hadoop 2.0

RUNNING THE CODE

1.Prerequisites

a) make sure JAVA_HOME is set in .bashrc.

echo $JAVA_HOME

b) Make sure you have write persmission in /home directory or wherever you are cloning the repo.

git clone https://github.com/Shivansh1805/KWHY.git

2.Give execution permission to the below mentioned scripts

chmod +x /home/path/to/KWHY/deploy-script/grid
chmod +x /home/path/to/KWHY/deploy/bin/run-job.sh
chmod +x /home/path/to/KWHY/deploy/bin/run-class.sh

If you face permission related problem with any script, just make it executable by above mentioned commands.

3.Build Samza distribution artifacts in deploy directory

mvn clean install

4.Start the grid script in bootstrap mode(It will automatically install and starts zookeeper, kafka, yarn)

Install and Start yarn, zookeeper, kafka

deploy-script/grid bootstrap

To start/stop zookeeper,yarn,kafka by a single command


deploy-script/grid start all
deploy-script/grid stop all

To start/stop zookeeper,yarn,kafka separately

deploy-script/grid start/stop kafka
deploy-script/grid start/stop yarn
deploy-script/grid start/stop zookeeper

Note - If you face some error while starting kafka just remove the kafka-logs directory from /tmp/ and start it again.

5. Update path accoridng to your local system

Update path in both files present in src/resources directory

yarn.package.path=/home/path/to/KWHY/deploy/deploy.tar.gz

6.Run the Samza deployment script with correct paths and attributes

/home/path/to/KWHY/deploy/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=/home/path/to/KWHY/deploy/conf/covid-parser.properties

Open browser and go to local yarn resource manager: http://localhost:8088/cluster to check status

7.Create consumers on two different terminals after submitting job to yarn .

/home/path/to/KWHY/deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic district-data

/home/path/to/KWHY/deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic state-data

8.Start kafka stream

Pull data from covid19 API using python script and pipe it to kafka producer and bind to 'district-data' & 'state-data' topic Go to producer directory and run the below command.

python3 covid-stream.py 

If you encounter error due to impropoer yarn shut down, simply delete /tmp/kafka-logs directory

9. See the streamed results on both the terminals where you launched the consumers.

NOTE

Before pushing the code

rm -rf deploy
mvn clean

kafka-covid-stream's People

Contributors

shivansh1805 avatar strikertanmay avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.