dive's Introduction

dive

dive is my hive-like project. It accept DML and DDL base on SQL. It runs data processing on hadoop.

current working

See issues https://github.com/silentdai/dive/issues

Milestone 0.1
SQL input: support natural join, predicate, group by and projection.
Data processing: run N + X map/reduce jobs on each SELECT. N is the 2-way join, X is the aggregation or projection.
History

This idea origins from the final project of database course. I need practise of implementing SQL and map-reduce.

The original repository is github.com/silentdai/mapred_jobs. As dive grows to more than 4,000 lines, it should keep indpendent of other experimental code.

dive's People

Contributors

Watchers

dive's Issues

investigation on tez

Current: N+X map-reduce jobs. Each job write result on HDFS. Only the last result is needed.

Possible steps:

read the book of yarn
write example application master
write example tez application
dive on tez

Create Operator Tree

dive doesn't have execute plan, only generate by hand-written plan.
If operator tree is constructed, some works can be started:

sub-query(SelectJob do not support sub-select job)
optimization such as push-down projection and selection

Current: the serialization format of Row is BytesWritable. Each time dive need to read or write, dive create a new byte array. A possible solution is to create instance from DataInput. (Write to output is easy, because dive should not write schema in it)

Possible Solution
Rewrite the Mapper/Reducer or upper task generator
As long as we set-up the schema before de-serializing the Row, we can use Row#read(in)
Maybe by rewrite the task generator
See http://avro.apache.org/docs/1.7.6/mr.html
'''
AvroJob.setInputSchema(conf, User.getClassSchema());
'''

Recommend Projects

lambdai / dive Goto Github PK

dive's Introduction

dive

dive's People

Contributors

Watchers

dive's Issues

investigation on tez

Create Operator Tree

Use Row as the Key/Value

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent