Pafka: Persistent Memory (PMem) Accelerated Kafka

Introduction

Pafka is an evolved version of Apache Kafka developed by MemArk. Kafka is an open-source distributed event streaming/message queue system for handling real-time data feeds efficiently and reliably. However, its performance (e.g., throughput and latency) is constrained by the slow disk. Pafka enhances Kafka using Intel® Optane™ Persistent Memory (PMem) that can achieve more efficient persistence performance compared with HDD/SSD. With careful design and implementation, Pafka can achieve 7.5 GB/s write throughput and 10 GB/s read throughput on a single CPU socket.

Pafka vs Kafka

Performance

We conducted some preliminary experiments on our in-house servers. One server is used as the Kafka broker server, and another two servers as the clients. Each of the client servers run 16 clients to saturate the server throughput. We're using the ProducerPerformance and ConsumerPerformance shipped by Kafka and the record size of 1024 for the benchmark.

Server Specification

The server spec is as follows:

Item	Spec
CPU	Intel(R) Xeon(R) Gold 6252 Processor (24 cores/48 threads) * 2
Memory	376 GB
Network	Mellanox ConnectX-5 100 Gbps
PMem	128 GB x 6 = 768 GB

The storage spec and performance:

Storage Type	Write (MB/s)	Read (MB/s)
HDD	32k: 5.7 320k: 37.5 3200k: 78.3	86.5
HDD RAID	530	313
Sata SSD	458	300
NVMe SSD	2,421	2,547
PMem	9,500	37,120

For HDD, we use batch size of 32k, 320k and 3200k for write, respectively, while read does not change much as we increase the batch size. For other storage types, we use batch size of 32k, as increasing to larger batch size does not increase the performance much. For PMem, we use PersistentMemoryBlock of pmdk llpl for the performance benchmark.

Performance Results

As we can see, the consumer throughput of Pafka with PMem has almost reached the network bottleneck (100 Gbps ~= 12.5 GB/s). Compared with NVMe SSD, Pafka boosts the producer throughput by 275% to 7508.68 MB/sec. In terms of latency, Pafka can achieve an average latency of 0.1 seconds for both producer and consumer.

Get Started

For complete documentation of Kafka, refer to here.

Docker Image

The easiest way to try Pafka is to use the docker image: https://hub.docker.com/r/4pdopensource/pafka-dev

docker run -it -v $YOUR_PMEM_PATH:/mnt/mem 4pdopensource/pafka-dev bash

where $YOUR_PMEM_PATH is the mount point of PMem (DAX file system) in the host system.

If you use the docker image, you can skip the following Compile step.

Compile

Dependencies

⚠️ We have done some modifications on the original pmdk source codes. Please download the source code from the two repositories provided above.

We have already shipped pcj and llpl jars in libs folder in the Pafka repository. They are compiled with java 8 and g++ 4.8.5. In general, you are not required to compile the two libraries by yourself. However, if you encounter any compilation/running error caused by these two libraries, you can download the source codes and compile on your own environment.

Compile pmdk libraries

After cloning the source code:

# compile pcj
cd pcj
make && make jar
cp target/pcj.jar $PAFKA_HOME/libs

# compile llpl
cd llpl
make && make jar
cp target/llpl.jar $PAFKA_HOME/libs

Build Pafka jar

./gradlew jar

Run

Environmental setup

To see whether it works or not, you can use any file system with normal hard disk. For the best performance, it requires the availability of PMem hardware mounted as a DAX file system.

Config

In order to support PMem storage, we add some more config fields to the Kafka server config.

Config	Default Value	Note
storage.pmem.path	/tmp/pmem	pmem mount path
storage.pmem.size	21,474,836,480	pmem size
log.pmem.pool.ratio	0.8	A pool of log segments will be pre-allocated. This is the proportion of total pmem size. Pre-allocation will increase the first startup time, but can eliminate the dynamic allocation cost when serving requests.
log.channel.type	file	log file channel type. Options: "file", "pmem". "file": use normal FileChannel as vanilla Kafka does "pmem": use PMemChannel, which will use pmem as the log storage

⚠️ log.preallocate has to be set to true if pmem is used, as PMem MemoryBlock does not support append-like operations.

Sample config in config/server.properties is as follows:

storage.pmem.path=/mnt/pmem/kafka/
storage.pmem.size=21474836480
log.pmem.pool.ratio=0.6
log.channel.type=pmem
# log.preallocate have to set to true if pmem is used
log.preallocate=true

Start Pafka

Follow instructions in https://kafka.apache.org/quickstart. Basically:

bin/zookeeper-server-start.sh config/zookeeper.properties > zk.log 2>&1 &
bin/kafka-server-start.sh config/server.properties > pafka.log 2>&1 &

Benchmark Pafka

Producer

# bin/kafka-producer-perf-test.sh --topic $TOPIC --throughput $MAX_THROUGHPUT --num-records $NUM_RECORDS --record-size $RECORD_SIZE --producer.config config/producer.properties --producer-props bootstrap.servers=$BROKER_IP:$PORT
bin/kafka-producer-perf-test.sh --topic test --throughput 1000000 --num-records 1000000 --record-size 1024 --producer.config config/producer.properties --producer-props bootstrap.servers=localhost:9092

Consumer

# bin/kafka-consumer-perf-test.sh --topic $TOPIC --consumer.config config/consumer.properties --bootstrap-server $BROKER_IP:$PORT --messages $NUM_RECORDS --show-detailed-stats --reporting-interval $REPORT_INTERVAL --timeout $TIMEOUT_IN_MS
bin/kafka-consumer-perf-test.sh --topic test --consumer.config config/consumer.properties --bootstrap-server localhost:9092 --messages 1000000 --show-detailed-stats --reporting-interval 1000 --timeout 100000

Limitations

We only benchmark the performance on the single-server setting. Multiple-server benchmark is undergoing.
pmdk llpl MemoryBlock does not provide a ByteBuffer API. We did some hacking to provide a zero-copy ByteBuffer API. You may see some warnings from JRE with version >= 9. We've tested on Java 8, Java 11 and Java 15.

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.intel.pmem.llpl.MemoryAccessor (file:/4pd/home/zhanghao/workspace/kafka/core/build/dependant-libs-2.13.4/llpl.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of com.intel.pmem.llpl.MemoryAccessor
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Currently, only the log files are stored in PMem, while the indexes are still kept as normal files, as we do not see much performance gain if we move the indexes to PMem.
The current released version (v0.1.x) uses PMem as the only storage device, which may limit the use for some scenarios that require a large capacity for log storage. The next release (v0.2.0) will address this issue by introducing a tiered storage strategy.

Roadmap

Version	Status	Features
v0.1.1	Released	- Use PMem for data storage - Significant performance boost compared with Kafka
v0.2.0	To be released in September 2021	- A layered storage strategy to utilize the total capacity of all storage devices (HDD/SSD/PMem) while maintaining the efficiency by our cold-hot data migration algorithms - Further PMem performance improvement by using `libpmem`

Community

Pafka is developed by MemArk (https://memark.io/en), which is a tech community focusing on leveraging modern storage architecture for system enhancement. MemArk is led by 4Paradigm (https://www.4paradigm.com/) and other sponsors (such as Intel). Please join our community for:

Chatting: For any feedback, suggestions, issues, and anything about using Pafka or other storage related topics, you can join our interactive discussion channel at Slack #pafka-help
Development discussion: If you would like to formally report a bug or suggestion, please use the GitHub Issues; if you would like to propose a new feature for some discussion, or would like to start a pull request, please use the GitHub Discussions, and our developers will respond promptly.

You can also contact the authors directly for any feedback:

ZHANG Hao: [email protected]
LU Mian: [email protected]

lumianph / pafka Goto Github PK

pafka's Introduction