Giter VIP home page Giter VIP logo

kafka's Introduction

Kafka - how to install and play with it

Running the installation

First, you need to make sure that you have Ansible, Vagrant and the vagrant-libvirt plugin installed. Here are the instructions to do this on an Ubuntu based system.

sudo apt-get update 
sudo apt-get install \
  libvirt-daemon \
  libvirt-clients \
  virt-manager \
  python3-pip \
  python3-libvirt \
  vagrant \
  vagrant-libvirt \
  git \
  openjdk-8-jdk
sudo adduser $(id -un) libvirt
sudo adduser $(id -un) kvm
pip3 install ansible lxml pyopenssl

You also need to download the Kafka distribution to your local machine and unzip it there - first, to have a test client locally, and second because we use a Vagrant-synced folder to replicate this archive into the virtual machines, in order not to download it once for every broker

wget http://mirror.cc.columbia.edu/pub/software/apache/kafka/2.4.1/kafka_2.13-2.4.1.tgz
tar xvf kafka_2.13-2.4.1.tgz
mv kafka_2.13-2.4.1 kafka

Now you can start the installation simply by running

virsh net-define kafka-private-network.xml
vagrant up
ansible-playbook site.yaml

Note that we map the local working directory into the virtual machine, so that the installation can use the already downloaded Kafka tarball. This synchronization is done using rsync and apparently only once, so it is important that the downloaded tarball is in place before you run the playbook.

Once the installation completes, it is time to run a few checks. First, let us verify that the ZooKeeper is running correctly on each node. For that purpose, SSH into the first node using vagrant ssh broker1 and run

/usr/share/zookeeper/bin/zkServer.sh status

This should print out the configuration file used by ZooKeeper as well as the mode the node is in (follower or leader). Note that the ZooKeeper server is listening for client connections on port 2181 on all interfaces, i.e. if you should also be able to connect to the server from the lab PC as follows (keep this in mind if you are running in an exposed environment, you might want to setup a firewall if needed)

ip=$(virsh domifaddr kafka_broker1 \
  | grep "ipv4" \
  | awk '{ print $4 }' \
  | sed 's/\/24//')
echo srvr | nc $ip 2181

or, via the private network

for i in {1..3}; do
    echo srvr | nc 10.100.0.1$i 2181
done

On each node, we should be able to reach each other node via its hostname on port 2181, i.e. on each node, you should be able to run

for i in {1..3}; do
    echo srvr | nc broker$i 2181
done

with the same results.

Now let us see whether Kafka is running on each node. First, of course, you should check the status using systemctl status kafka. Then, we can see whether all brokers have registered themselves with ZooKeeper. To do this, run

sudo /usr/share/zookeeper/bin/zkCli.sh -server broker1:2181 ls /brokers/ids

on any of the broker nodes. You should get a list with the broker ids of the cluster, i.e. usually [1,2,3]. Next, log into one of the brokers and try to create a topic.

/opt/kafka/kafka_2.13-2.4.1/bin/kafka-topics.sh \
  --create \
  --bootstrap-server broker1:9092 \
  --replication-factor 3 \
  --partitions 2 \
  --topic test

This will create a topic called test with two partitions and a replication factor of three, i.e. each broker node will hold a copy of the log. When this command completes, you can check that corresponding directories (one for each partition) have been created in /tmp/kafka-logs on every node.

Let us now try to write a message into this topic. Again, we run this on the broker:

/opt/kafka/kafka_2.13-2.4.1/bin/kafka-console-producer.sh \
  --broker-list broker1:9092 \
  --topic test

Enter some text and hit Ctrl-D. Then, on some other node, run

/opt/kafka/kafka_2.13-2.4.1/bin/kafka-console-consumer.sh \
  --bootstrap-server broker1:9092 \
  --from-beginning \
  --topic test

You should now see what you typed.

Securing Kafka broker via TLS

In our setup, each Kafka broker will listen on the private interface with a PLAINTEXT listener, i.e. an unsecured listener. Of course, we can also add an additional listener on the public interface, so that we can reach the Kafka broker from a public network (in our setup using KVM, the "public" network is of course technically also just a local Linux bridge, but in a cloud setup, this will be different). To achieve this, we need to

  • create a public / private key pair for each broker
  • create a certificate for each broker and sign it
  • create keystore for the broker, holding the signed certificate and the keys
  • create a truststore for the client, containing the CA used to sign the server certificate
  • create a key pair and certificate for the client, bundle this in a keystore and make it available to the client
  • create a properties file for the client containing the TLS configuration

Once this has been done, you can now run the consumer locally and connect via the SSL listener on the public interface:

kafka/bin/kafka-console-consumer.sh   \
  --bootstrap-server $(./python/getBrokerURL.py)   \
  --consumer.config .state/client_ssl_config.properties \
  --from-beginning \
  --topic test 

Using the Python client

To be able to use the Python Kafka client, you will have to install the required packages on your local PC. Assuming that you have a working Python3 installation including pip3, simply run

pip3 install kafka-python

on your lab PC. For these notes, I have used version 2.0.1, so in case you want to use the exact same version, replace this by

pip3 install kafka-python==2.0.1

You can check which version you have by running cat ~/.local/lib/python3*/site-packages/kafka/version.py. Now you should be able to run the scripts in the python subdirectory. To simply create 10 messages for the test topic that we have created above, run (from the root of the repository)

python3 python/producer.py

On the broker node, you can now verify that this works by running

/opt/kafka/kafka_2.13-2.4.1/bin/kafka-dump-log.sh \
  --print-data-log \
  --files /opt/kafka/logs/test-0/00000000000000000000.log

which will print the first segment of partition 0 of the test partition. Finally, you can test our consumer by running

python3 python/consumer.py

The consumer has a few flags which are useful for testing.

  • --no_commit - do not commit any messages, neither manually nor automatically
  • --disable_auto_commit - disable automated commits and commit explictly after each batch of messages
  • --reset - do not read any messages, but seek all partitions of the test topic to the beginning and commit these offsets
  • --max_poll_records - number of records in a batch, default is one

kafka's People

Contributors

christianb93 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.