Giter VIP home page Giter VIP logo

ballista's Introduction

Ballista

License Version Gitter Chat

Overview

Ballista is an experimental distributed compute platform, powered by Apache Arrow, with support for Rust and JVM (Java, Kotlin, and Scala).

The foundational technologies in Ballista are:

  • Apache Arrow Flight protocol for efficient data transfer between processes.
  • Google Protocol Buffers for serializing query plans.
  • Docker for packaging up executors along with user-defined code.
  • Kubernetes for deployment and management of the executor docker containers.

Why Ballista?

Ballista is at a very early stage of development and therefore has little value currently, but the hope is to demonstrate a number of benefits due to the choice of Apache Arrow as the memory model.

Having a common memory model removes the overhead associated with supporting multiple programming languages. It makes it possible for high-level languages such as Python and Java to delegate operations to lower-level languages such as C++ and Rust without the need to serialize or copy data within the same process (pointers to memory can be passed instead).

The common memory model also makes it possible to transfer data extremely efficiently between processes (regardless of implementation programming language) because the memory format is also the serialization format. These network transfers could be between processes on the same node (or in the same Kubernetes pod), or between different pods within a cluster.

There are different value propositions for different audiences.

Ballista for Rustaceans

Ballista will provide a distributed compute environment where it will be possible for all processing, including user-defined code, to happen in Rust.

However, Ballista will also provide interoperability with other ecosystems, including Apache Spark, allowing Rust to be introduced gradually into existing pipelines.

Ballista for JVM Developers

Ballista provides a JVM query engine (implemented in Kotlin) as well as interoperability with Apache Spark (implemented in Scala) . Ballista will also provide support for JNI integration with C++ and/or Rust compute kernels (such as delegating to Gandiva or DataFusion).

This will allow JVM developers to leverage their investment in existing code and ecosystem whilst taking advantage of the memory efficiency and increased performance from offloading certain operations to lower-level languages.

Examples

The following examples should help illustrate the current capabilities of Ballista

Status

  • It is possible to manually execute distributed hash aggregate queries and simple filters and projections in Rust and JVM.
  • The distributed scheduler work is being designed with the hope of this being available in the Summer of 2020.

Documentation

The Ballista User Guide is hosted on the Ballista website, along with the Ballista Blog where news and release notes are posted.

Contributing

See CONTRIBUTING.md for information on contributing to this project.

ballista's People

Contributors

andygrove avatar sd2k avatar houqp avatar max-sixty avatar blad avatar rrichardson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.