Hi. The benchmark results look awesome. Would it be possible to provide more details a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

three fold: Benchmark should happen in the context of "apple t

more details about the tests about tensorbase_frontier_edition HOT 3 OPEN

tensorbase commented on May 23, 2024

more details about the tests

from tensorbase_frontier_edition.

Comments (3)

jinmingjian commented on May 23, 2024

@sanikolaev thanks for interesting. I am busy in too many things in these days:)

DRAM is 32*6=192GB (6-channel, 32GB per channel, standard config for xeon-sp single socket bare metal) NOTE: The size of RAM is not important here in that we run results multiple times (to let data in kinds of cache) and the query set is far smaller than 192GB. (But it will be more much interesting we show big big big dataset in future.)
the data is simply: 2-column, 32bit Integer per column(Datetime is implemented in 32bit for both CH and TB), 1.47B-row stripped NYC taxi dataset. NOTE: The number of total columns in a table is not important here in that we are talking about column-wise stores.
I am working on primary String support, so it will be soon to show some initial benchmark results from TPC-H. (the alpha website is released in advance before my imagination)
There is no paper because of time... The initial storage, in fact, is primary. The interesting is that how the data goes into storage. It does not use common LSM tree or similar like in CH and even mostly popular opensource peers. The bad of LSM tree is that you pay for fast writing in a long run. Further two questions you can ask here:

Does the LSM tree achieve the global optimum for the 7x24 time-span servers?
how fast if we discard LSM tree?
And I think TensorBase gives out its innovative answers:smile:

The full open source could come faster than I thought. Before this happen, I am interesting to invite some early users/people/partners to join the work more quickly. If you and others are interested in this, you can connect me via any ways.

from tensorbase_frontier_edition.

sanikolaev commented on May 23, 2024

NOTE: The size of RAM is not important here in that we run results multiple times (to let data in kinds of cache)

Why is it so? It doesn't seem practical to me to measure only hot queries. In real analytics while doing real queries the chance for a IO operation is high.

and the query set is far smaller than 192GB. (But it will be more much interesting we show big big big dataset in future.)

Yes, even if you measure only hot queries it will be interesting to see the results when the data can't be fully fit into RAM. Then you'll have to read from disk, then the storage format will be a key thing: how well the data is compressed, in how many iops you can read it, what exactly you want to keep in the limited RAM amount while processing the query etc.

I'll be happy to play with the opensource version when it's available.

from tensorbase_frontier_edition.

jinmingjian commented on May 23, 2024

three fold:

Benchmark should happen in the context of "apple to apple". (In fact, many benchmarks fails to achieve this). To enable cache effect is just to enable "apple to apple". Because the loading of data from disks may be too complex. The cache mechanism (data loading from disks) are varied. You said compression is just one part of big picture to affect the loading. For example, I can use two layers of cache, but you just use a layer. You may be faster in the one-shot first loading, but I am better at global performance. This is why the modern x86 CPU has L1/2/3 three-layer cache.
Also as you said, hot data can be hit in the (memory) cache. So this comparison is still meaningful in good parts of real world cases. And this is why we have kinds of cache.
You are right. benchmarks which is IO bounding is another important scenario because data can't be fully fit into RAM. This is workload dependent. This is in fact TensorBase wants to solve than other opensource peers. It is possible to show interesting results in next time benchmark time. But the caveat is still "apple to apple". We just discuss this in that time.

from tensorbase_frontier_edition.

more details about the tests about tensorbase_frontier_edition HOT 3 OPEN

Comments (3)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent