4paradigm / pafka Goto Github PK

This project forked from apache/kafka

Pafka is originated from the OpenAIOS project to leverage an optimized tiered storage access strategy to improve overall performance for streaming/messaging system.

License: Apache License 2.0

Shell 0.29% Java 73.49% Scala 23.17% HTML 0.01% Python 2.82% Batchfile 0.08% XSLT 0.02% Dockerfile 0.03% Roff 0.10%

java kafka message-queue pmem scala streaming

pafka's People

Contributors

Stargazers

Watchers

Forkers

org-mars zhanghaohit lumianph wlinuxhv www6v galallino

pafka's Issues

bench.py consumer stats not correct

For consumer benchmark, the last few records received are not correct.

[21/10/2021 21:57:03][0] 6481064 records received, 3232811.41 records/sec (3157.04 MB/s), 4.85 ms avg latency, 1170.00 ms max latency (aggregated from 32 clients)
[21/10/2021 21:57:05][2] 18411514 records received, 5950367.06 records/sec (5810.91 MB/s), 3.40 ms avg latency, 1170.00 ms max latency (aggregated from 32 clients)
[21/10/2021 21:57:07][4] 29844656 records received, 5703169.78 records/sec (5569.50 MB/s), 3.15 ms avg latency, 1170.00 ms max latency (aggregated from 32 clients)
[21/10/2021 21:57:09][6] 42312146 records received, 6216098.09 records/sec (6070.41 MB/s), 2.97 ms avg latency, 1170.00 ms max latency (aggregated from 32 clients)
[21/10/2021 21:57:11][8] 55931432 records received, 6795711.71 records/sec (6636.44 MB/s), 2.80 ms avg latency, 1170.00 ms max latency (aggregated from 32 clients)
[21/10/2021 21:57:13][10] 71915659 records received, 8104939.24 records/sec (7914.98 MB/s), 2.61 ms avg latency, 1170.00 ms max latency (aggregated from 32 clients)
1 clients completed. Wait for 31 other clients
[21/10/2021 21:57:14][11] 85614838 records received, 7971641.13 records/sec (7784.80 MB/s), 2.45 ms avg latency, 1170.00 ms max latency (aggregated from 31 clients)
9 clients completed. Wait for 23 other clients
[21/10/2021 21:57:15][12] 70314141 records received, 5063298.58 records/sec (4944.63 MB/s), 2.45 ms avg latency, 1115.00 ms max latency (aggregated from 23 clients)
26 clients completed. Wait for 6 other clients
[21/10/2021 21:57:15][12] 18750685 records received, 1108660.15 records/sec (1082.68 MB/s), 2.65 ms avg latency, 856.00 ms max latency (aggregated from 6 clients)
All 32 clients completed

Pafka 写 PMem 的时候需要同步刷盘

问题：
1、PMem 定位是持久化内存，可以作为磁盘使用但是不符合PMem设计的初衷。
2、PMem 如果写满，数据需要保留，如何迁移？
3、后期如果支持PMem 刷盘，需要考虑PMem 数据淘汰策略/阈值，保证PMem空间一直可用，从而保证其性能平稳。

Support SSD/NVME as first layer storage

Current design

First layer: PMem
Second layer: any generic file system (HDD/SSD/NVME)

Proposed design

First layer: any generic file system (SSD/NVME) and PMem
Second layer: any generic file system (HDD/SSD/NVME)

Pafka metrics 应当加入 PMem 指标

需求：
Pafka metrics 应当加入 PMem 指标。
理由：
方便监控PMem用量等信息，排查性能问题和调优。

Support multiple devices for each storage layer

Problem

Now we only allow to config one PMem location for high layer storage, and one HDD location for low layer storage. If we have multiple devices (e.g., we have two PMem devices in one server), we may want to use all the devices to store the data.

Current solution

Use LVM or RAID to combine multiple physical devices into a single virtual device. To Pafka, it is a single device, but virtually.

Proposed solution

Naturally support multiple devices in Pafka. Config may be like this:

    storage.pmem.paths=/pmem0/pafka,/pmem1/pafka
    storage.hdd.paths=/hdd0/pafka,/hdd1/pafka

update with kafka 3.0.0

It's time to update with Kafka new version.

建议在pafka上使用llpl的libpmem接口

问题：
在casacde lake上代平台，pafka压测会有同时读写的问题，这是由于目前的pafka没有用llpl的libpmem接口，无法写穿(PMEM_F_MEM_NONTEMPORAL)。