Excited to see this new org created. I am interested to see if Apache Arrow (i.e. cont

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

That sounds good to me. Adding <a class="user-mention notranslate" data-hovercard-type

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

PFOR and run-length also planned, not yet done. <span class="email

Collaborations on columnar data structures about cudf HOT 6 CLOSED

rapidsai commented on May 14, 2024

Collaborations on columnar data structures

from cudf.

Comments (6)

tmostak commented on May 14, 2024

Hi @wesm, thanks for this. Yes we are excited about Arrow (even though we are only supporting a subset at the moment) because it provides interoperability with lots of other things and makes sense as a way to represent columnar data. I don't see any issues why it should not be performant on GPU, as the MapD native format is quite similar (except we store nulls in-line when possible to save space and bandwidth). Would it make sense to set up a call with the project members so we can discuss ways to collaborate?

from cudf.

wesm commented on May 14, 2024

That sounds good to me. Adding @julienledem @xhochy since they will be interested, and maybe other from the Apache Arrow team.

I am interested in

Ingest data (zero-copy, preferably) from Arrow record batches
Ingest data to MapD from Arrow
Export data as Arrow record batches
UDF protocol for batch-based UDFs
Benchmarks and analysis of pros/cons of different columnar-type memory layouts on the GPU (you say you store nulls inline -- does that mean sentinel values? Otherwise I am not sure how you could be more efficient that 1 bit per value for data that has nulls).

As background, I did some GPU development for accelerating Bayesian inference problems years ago and did a fair amount of CUDA C and PyCUDA work, so I've had a long-standing interest in architecting data structures and memory access patterns for the GPU.

from cudf.

billmaimone commented on May 14, 2024

Bingo on all fronts, all things mentioned in the talk I gave last week at GTC. We have also some basic work to do for supporting the rest of the data types (prototype did only simple, uncompress numerics to keep it simple).

from cudf.

wesm commented on May 14, 2024

Does the GPU benefit from columnar compression techniques like CPU-based columnar databases do?

from cudf.

m1mc commented on May 14, 2024

@wesm, we already have some in core engine like dictionary compression. And we are planning to tokenize any string column that only has digits to save memory, but they don't require to be columnar if you just mean sth. like RLE or HCC. All ways aim to keep GPU decoding fast.

from cudf.

billmaimone commented on May 14, 2024

PFOR and run-length also planned, not yet done.

…

On Thu, May 18, 2017 at 2:56 PM, Minggang Yu ***@***.***> wrote: @wesm <https://github.com/wesm>, we already have some in core engine like dictionary compression. And we are planning to tokenize any string column that only has digits to save memory. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHEP-HaH_AM9vNpUNhov3XQTU5sSbKSkks5r7L6JgaJpZM4NUBSb> .

from cudf.

Recommend Projects

Collaborations on columnar data structures about cudf HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent