Giter VIP home page Giter VIP logo

stefen-taime / kafka-pipeline Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 3.0 2.71 MB

In the following post, we will learn how to build a data pipeline using a combination of open-source software (OSS), including Debezium, Apache Kafka, Kafka Connect.

Shell 43.39% Dockerfile 28.86% Python 27.75%
bash data docker elasticsearch etl-pipeline k kafka kafka-connect kafka-streams kafka-topic kibana ksqldb masking mongodb mysql pii pipeline postgresql

kafka-pipeline's Introduction

Kafka Connect Pipeline

Architecture

Creating infrastructure...

chmod a+x manage.sh
sh manage.sh up

Image

Adminer => localhost:8085

[mysql] Host: mysql user: root password: kafkademo db: demo

Image

Then run MySQL Source:

http POST localhost:8083/connectors @connect/mysql-source.json
./bin/kafka-avro-console-consumer --topic mysql.demo.customers --from-beginning

Image

Then run Postgres Sink:

http POST localhost:8083/connectors @connect/postgres-sink.json

Image

[postgres] Host: postgres user: postgres password: kafkademo db: demo Image

Then fill with data:

docker exec -ti mongo mongo -u debezium -p dbz --authenticationDatabase admin localhost:27017/demo

PRIMARY> show collections;
PRIMARY> db.user_actions.insert({"userId": 2, "ts": new Date(), "ip": "192.168.0.8"}),
         db.user_actions.insert({"userId": 3, "ts": new Date(), "ip": "192.168.233.255"}),
         db.user_actions.insert({"userId": 4, "ts": new Date(), "ip": "192.168.0.0.1"}),
         db.user_actions.insert({"userId": 5, "ts": new Date(), "ip": "10.0.0.15"}),
         db.user_actions.insert({"userId": 6, "ts": new Date(), "ip": "10.20.33.4"}),
         db.user_actions.insert({"userId": 7, "ts": new Date(), "ip": "192.255.30.4"}),
         db.user_actions.insert({"userId": 8, "ts": new Date(), "ip": "10.0.0.182"}),
         db.user_actions.insert({"userId": 9, "ts": new Date(), "ip": "10.0.0.10"}),
         db.user_actions.insert({"userId": 10, "ts": new Date(), "ip": "168.0.0.1"}),
         db.user_actions.insert({"userId": 11, "ts": new Date(), "ip": "10.10.255.2"}),
         db.user_actions.insert({"userId": 12, "ts": new Date(), "ip": "168.255.3.4"})

Finally, setup source and sink:

http POST localhost:8083/connectors @connect/mongo-source.json 
http POST localhost:8083/connectors @connect/postgres-mongo-sink.json

Image

Events + PII masking

http POST localhost:8083/connectors @connect/postgres-mongo-sink-no-pii.json

Image

Elasticsearch

Setup connector:

http POST localhost:8083/connectors @connect/elastic-sink.json

Check results:

curl http://localhost:9200/mysql.demo.customers/_search?pretty=true&q=*:*

Image

KSQL processing

Connect to console:

docker exec -ti ksql ksql

Add sample query

set 'auto.offset.reset'='earliest';
create stream customers with(kafka_topic='mysql.demo.customers', value_format='AVRO');
create stream addressLine1_changed_notification with (value_format='JSON') as 
    select before->addressLine1 rcpt, concat('Your address was changed to ', after->addressLine1) message
from customers where before->addressLine1 <> after->addressLine1;

update AddressLine1 column (Table Customers)

Image

Check the data:

./bin/kafka-console-consumer --topic ADDRESSLINE1_CHANGED_NOTIFICATION --from-beginning

Image

Inspiration

https://github.com/szczeles/kafkaconnect-demo

kafka-pipeline's People

Contributors

stefen-taime avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.