Presto, MySQL, Minio and Kafka in Docker. Query your MySQL data and join it to Kafka and Minio data.
This repository includes docker-compose setup to join MySQL, Minio and Kafka data using Presto, along with some notes on how to load the data and perform the queries. It is deliberately not fully automated to guide the user through performing this.
Launch everything (Presto, Zookeeper, Kafka, MySQL, Minio):
docker-compose up
Get access to MySQL to load some data (./data is mounted in /tmp/data):
docker-compose exec mysql mysql -uuser -ppassword wheresalice
Load the data:
source /tmp/data/load.sql
Load some data into Kafka:
docker-compose exec kafka /bin/bash
curl -o kafka-tpch https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_0811-1.0.sh
chmod 755 kafka-tpch
./kafka-tpch load --brokers localhost:9092 --prefix tpch. --tpch-type tiny
exit
Get access to Presto:
docker-compose exec presto presto
Query MySQL data in Presto:
use mysql.wheresalice;
show tables;
Query Kafka data in Presto:
SELECT _message FROM customer LIMIT 5;
SELECT sum(account_balance) FROM kafka.tpch.customer LIMIT 10;
Join the two together:
SELECT customer.account_balance, contacts.email FROM kafka.tpch.customer, mysql.wheresalice.contacts contacts WHERE customer.customer_key = contacts.customer_key;
View what's happening through the Presto UI: http://localhost:8080/ui/
Minio is included in this stack to mock out S3. It currently takes a little manual configuration to use.
docker-compose exec minio /bin/sh
mkdir -p /data/catalog/ && mkdir -p /data/csvdata
echo "[email protected],alice" > /data/csvdata/data.csv
exit
Then create the table in Presto shell to query:
create schema s3.default;
create table s3.default.users (email varchar, username varchar) WITH (external_location='s3a://csvdata/',format = 'csv');
select * from s3.default.users;
SELECT users.username, contacts.customer_key FROM s3.default.users, mysql.wheresalice.contacts WHERE users.email = contacts.email;
You can also upload data into Minio using a web browser via http://localhost:9000
Access Key: minio Secret Key: minio123
- The data in Kafka has to be in JSON or plain Avro to be able to parse it in Presto. There is not currently any support for Confluent Avro with Schema Registry.