Giter VIP home page Giter VIP logo

capuchin's Introduction

Capuchin

Distributed CSV Query Engine

Status: mostly nonsense

Install

$ git clone [email protected]:EwanValentine/capuchin.git
$ cd capuchin && go install
$ capuchin 

Commands

$ capuchin start // starts both gRPC and HTTP proxy server
$ capuchin grpc // starts just the gRPC server
$ capuchin http // starts just the http server, pointless as it uses the gRPC server 

Query API

Test dataset example:

order_id,user_id,date
abc123,abc123,2021-09-01
def456,def123,2021-09-02
abc123,abc123,2021-09-03

Using httpie using the test data set.

$ http post localhost:9999/v1/query \
  select:='["user_id"]' \
  where="user_id = abc123" \
  source="./query/test-data.csv"

Example (Library)

Example using Capuchin as a library:

s := source.NewFileSource()
fileSource, err := s.Load("./query/test-data.csv")
if err != nil {
  log.Panic(err)
}

query := &query.Query{
  Select: []string{"user_id", "date"},
  Where:  "user_id = abc123",
}
query.Source(fileSource)

results, err := query.Exec()
if err != nil {
  log.Panic(err)
}

log.Println(results)

Data Management

  1. Point your Capuchin cluster at your datalake, defining which column in your data is your date key.
  2. The Capuchin nodes will create a shard, loading batches of the data into memory using a date range. For example node 0 will load 20190101 to 20190801 into memory using a sharding algorithm to automatically divide the data.

capuchin's People

Contributors

ewanvalentine avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.