Giter VIP home page Giter VIP logo

mlip-kafka-lab's Introduction

Lab 2: Kafka for Data Streaming

In this lab, you will gain hands-on experience with Apache Kafka, a distributed streaming platform that plays a key role in processing large-scale real-time data. You will establish a connection to a Kafka broker, produce and consume messages, and explore Kafka command-line tools. This lab will prepare you for your group project, where you'll work with Kafka streams.

To receive credit for this lab, show your work to the TA during recitation.

Deliverables

  • Establish a secure SSH tunnel to the Kafka server. Explain to the TA about Kafka Topic and Offsets. How do they ensure message continuity if a consumer is disconnected?
  • Modify starter code to implement producer and consumer modes for a Kafka topic.
  • Use Kafka's CLI tools to manage and monitor Kafka topics and messages.

Getting started

  • Clone the starter code from this Git repository.
  • The repository includes a python notebook for Kafka producer and consumer model.
  • Install the Kafka Python package by running:
    python -m pip install kafka-python

Connecting to Kafka server

  1. Use SSH to create a tunnel to the Kafka server:
    ssh -L <local_port>:localhost:<remote_port> <user>@<remote_server> -NTf
  2. Test the Kafka server connection to ensure it's operational.

Implementing Producer-Consumer Mode

1. Producer Mode: Writes Data to Broker

Refer TODO sections in the script. Edit the bootstrap servers and add 2-3 cities of your choice. Run the code to write to Kafka stream.

2. Consumer Mode: Reads Data from Broker

Modify the TODO section by filling appropriate parameters/arguments in the starter code. Verify Kafka_log.csv.

Ref: KafkaProducer Documentation
KafkaConsumer Documentation

Using Kafka’s CLI tools

Kcat is a CLI (Command Line Interface). Previously known as kafkacat.
Install with your package installer such as:

  • macOS: brew install kcat
  • Ubuntu: apt-get install kcat
  • Note for Windows Users: Setting up kcat on Windows is complex. Please work in pairs with someone with mac/Ubuntu during recitation for this deliverable. The purpose is to understand CLI which will be helpful in the group project for using Kafka on Virtual machines (Linux based).

Using the kcat documentation, write a command that connects to the local Kafka broker, specifies a topic, and consumes messages from the earliest offset.

Ref: kcat usage
kcat GitHub

Optional but Recommended

For your group project you will be reading movies from the Kafka stream. Try finding the list of all topics and then read some movielog streams to get an idea of what the data looks like:
kcat -b localhost:9092 -L

Additional resources

mlip-kafka-lab's People

Contributors

tanya-5 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.