Light

vaquarkhan / kafka_batch_processing_using_spark_sample Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scriperdj/kafka_batch_processing_using_spark_sample

0.0 1.0 0.0 482 KB

Use Spark in batch mode to process messages in multi partitioned Kafka topics

Scala 100.00%

kafka_batch_processing_using_spark_sample's Introduction

Example Spark Application for Batch processing of multi partitioned Kafka topics

This example application reads given Kafka topic & broker details and does below operations

Get partition & offset details of provided Kafka topics.
Create DataFrame with the data read.
Find Top trending product in each category based on users browsing data.

The output is not implemented. It just displays the Top 10 Product/Category in logs.

Please read the blog post in here to know about the approach & introduction about the problem.

Build

> sbt package

Execution

Create kafka topic with required number of partition & other configurations

> kafka-topics --zookeeper localhost:2181 --replication-factor 1 --create --partitions 5 --topic web_stream --config retention.ms=604800000

Generate Sample data for the topic

> kafka-console-producer --broker-list localhost:9092 --topic web_stream  --property parse.key=true --property key.separator=,
> cus_001,{"product":"PD0021","category": "Books","ts":"1516978415"}
> cus_001,{"product":"PD0022","category": "TV","ts":"1517978415"}

Submit Spark job

> spark-submit --packages org.apache.kafka:kafka-clients:0.11.0.0,org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 --class example.spark.BatchProcessKafka --master yarn target/scala-2.11/kafka_batch_processing_using_spark_sample_2.11-1.0.jar web_stream localhost:9092

kafka_batch_processing_using_spark_sample's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.