In this project, you will construct a streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.
When the project is complete, you will be able to monitor a website to watch trains move from station to station.
The following are required to complete this project:
- Docker
- Python 3.7
- Access to a computer with a minimum of 16gb+ RAM and a 4-core CPU to execute the simulation
The Chicago Transit Authority (CTA) has asked us to develop a dashboard displaying system status for its commuters. We have decided to use Kafka and ecosystem tools like REST Proxy and Kafka Connect to accomplish this task.
In addition to the course content you have already reviewed, you may find the following examples and documentation helpful in completing this assignment:
- Confluent Python Client Documentation
- Confluent Python Client Usage and Examples
- REST Proxy API Reference
- Kafka Connect JDBC Source Connector Configuration Options
To run the simulation, you must first start up the Kafka ecosystem on their machine utilizing Docker Compose.
%> docker-compose up
Docker compose will take a 3-5 minutes to start, depending on your hardware. Please be patient and wait for the docker-compose logs to slow down or stop before beginning the simulation.
Once docker-compose is ready, the following services will be available:
Service | Host URL | Docker URL | Username | Password |
---|---|---|---|---|
Public Transit Status | http://localhost:8888 | n/a | ||
Landoop Kafka Connect UI | http://localhost:8084 | http://connect-ui:8084 | ||
Landoop Kafka Topics UI | http://localhost:8085 | http://topics-ui:8085 | ||
Landoop Schema Registry UI | http://localhost:8086 | http://schema-registry-ui:8086 | ||
Kafka | PLAINTEXT://localhost:9092,PLAINTEXT://localhost:9093,PLAINTEXT://localhost:9094 | PLAINTEXT://kafka0:9092,PLAINTEXT://kafka1:9093,PLAINTEXT://kafka2:9094 | ||
REST Proxy | http://localhost:8082 | http://rest-proxy:8082/ | ||
Schema Registry | http://localhost:8081 | http://schema-registry:8081/ | ||
Kafka Connect | http://localhost:8083 | http://kafka-connect:8083 | ||
KSQL | http://localhost:8088 | http://ksql:8088 | ||
PostgreSQL | jdbc:postgresql://localhost:5432/cta |
jdbc:postgresql://postgres:5432/cta |
cta_admin |
chicago |
Note that to access these services from your own machine, you will always use the Host URL
column.
When configuring services that run within Docker Compose, like Kafka Connect you must use the Docker URL. When you configure the JDBC Source Kafka Connector, for example, you will want to use the value from the Docker URL
column.
There are two pieces to the simulation, the producer
and consumer
. As you develop each piece of the code, it is recommended that you only run one piece of the project at a time.
However, when you are ready to verify the end-to-end system prior to submission, it is critical that you open a terminal window for each piece and run them at the same time. If you do not run both the producer and consumer at the same time you will not be able to successfully complete the project.
cd producers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python simulation.py
Once the simulation is running, you may hit Ctrl+C
at any time to exit.
cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
faust -A faust_stream worker -l info
cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python ksql.py
** NOTE **: Do not run the consumer until you have reached Step 6!
cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python server.py
Once the server is running, you may hit Ctrl+C
at any time to exit.
Our architecture will look like so: