Giter VIP home page Giter VIP logo

Comments (6)

tmostak avatar tmostak commented on May 14, 2024

Hi @wesm, thanks for this. Yes we are excited about Arrow (even though we are only supporting a subset at the moment) because it provides interoperability with lots of other things and makes sense as a way to represent columnar data. I don't see any issues why it should not be performant on GPU, as the MapD native format is quite similar (except we store nulls in-line when possible to save space and bandwidth). Would it make sense to set up a call with the project members so we can discuss ways to collaborate?

from cudf.

wesm avatar wesm commented on May 14, 2024

That sounds good to me. Adding @julienledem @xhochy since they will be interested, and maybe other from the Apache Arrow team.

I am interested in

  • Ingest data (zero-copy, preferably) from Arrow record batches
  • Ingest data to MapD from Arrow
  • Export data as Arrow record batches
  • UDF protocol for batch-based UDFs
  • Benchmarks and analysis of pros/cons of different columnar-type memory layouts on the GPU (you say you store nulls inline -- does that mean sentinel values? Otherwise I am not sure how you could be more efficient that 1 bit per value for data that has nulls).

As background, I did some GPU development for accelerating Bayesian inference problems years ago and did a fair amount of CUDA C and PyCUDA work, so I've had a long-standing interest in architecting data structures and memory access patterns for the GPU.

from cudf.

billmaimone avatar billmaimone commented on May 14, 2024

Bingo on all fronts, all things mentioned in the talk I gave last week at GTC. We have also some basic work to do for supporting the rest of the data types (prototype did only simple, uncompress numerics to keep it simple).

from cudf.

wesm avatar wesm commented on May 14, 2024

Does the GPU benefit from columnar compression techniques like CPU-based columnar databases do?

from cudf.

m1mc avatar m1mc commented on May 14, 2024

@wesm, we already have some in core engine like dictionary compression. And we are planning to tokenize any string column that only has digits to save memory, but they don't require to be columnar if you just mean sth. like RLE or HCC. All ways aim to keep GPU decoding fast.

from cudf.

billmaimone avatar billmaimone commented on May 14, 2024

from cudf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.