Giter VIP home page Giter VIP logo

ddf-flink's Introduction

DDF with Flink

This project depends on DDF and uses Apache Flink engine.

DDF

Distributed DataFrame: Productivity = Power x Simplicity For Big Data Scientists & Engineers


Getting Started

This project depends on DDF core and requires its installation to run. To get DDF core, clone DDF repo and checkout the master branch.

$ git clone [email protected]:ddf-project/DDF.git
$ cd DDF

DDF can be installed by,

$ sbt publishLocal

Installing ddf-with-flink can be done by

$ git clone [email protected]:ddf-project/ddf-flink.git
$ cd ddf-with-flink
$ bin/run-once.sh
$ mvn package install -DskipTests

Running tests

Tests can be run either through SBT or Maven,

$ sbt test
$ mvn test

//running a single test

$ sbt "testOnly *FlinkDDFManagerSpec*"

$ mvn test -Dsuites='io.ddf.flink.FlinkDDFManagerSpec'

Starting ddf-shell with flink engine

Execute the following only after installing ddf-with-flink

$ sbt package
$ bin/ddf-shell

SBT package is required since it generates the lib_managed which is required for running the scripts.

Running the example,

$ sbt package
$ bin/run-flink-example io.ddf.flink.examples.FlinkDDFExample

SBT package is required since it generates the lib_managed which is required for running the scripts.

####Todo

  1. Test the ML method getConfusionMatrix
  2. Implement transformPython and flattenDDF for TransformationHandler and also test the R functions.
  3. Implement the methods r2score, residuals, roc and rmse for MLMetricsSupporter

ddf-flink's People

Contributors

binhmop avatar ctn avatar dabaitu avatar dungnn avatar huandao0812 avatar khangich avatar ljzzju avatar namma avatar nhanitvn avatar pangzhi avatar piccolbo avatar pzzs avatar qinxinwei avatar shiti avatar trulite avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ddf-flink's Issues

Identify the test scenarios

We should review the Spark implementation test cases and document and write test specs for our Flink implementation

Restrict third-party application access to GitHub repo

Hi @tuplejump/owners, apparently GitHub defaults to open access via OAUTH, unless you specifically reconfigure it. Please go to https://github.com/organizations/tuplejump/settings/oauth_application_policy, and click on "Set up application access restrictions", then "Restrict third-party application access".

I ran into this when authorizing spark-packages to connect to ddf-project, then noticed that it proceeds to propose to grant access to all the repos I have access to, which is not a good idea.

Cheers,

https://www.dropbox.com/s/pk6xe0vxfhs06ni/Screenshot%202015-06-17%2000.51.56.png?dl=0

Implement SchemaHandler

This is prerequisite for the ML RepresentationHandlers. Namely correct implementation of these methods -

  1. getSchema
  2. getColumns
  3. getNumColumns

SQLHandler does not support complex query

The following query throws an illegal args exception

select * from airlineNA where ( (case when Year is null then 1 else 0 end) + (case when Month is null then 1 else 0 end) + (case when DayofMonth is null then 1 else 0 end) + (case when DayOfWeek is null then 1 else 0 end) + (case when DepTime is null then 1 else 0 end) + (case when CRSDepTime is null then 1 else 0 end) + (case when ArrTime is null then 1 else 0 end) + (case when CRSArrTime is null then 1 else 0 end) + (case when UniqueCarrier is null then 1 else 0 end) + (case when FlightNum is null then 1 else 0 end) + (case when TailNum is null then 1 else 0 end) + (case when ActualElapsedTime is null then 1 else 0 end) + (case when CRSElapsedTime is null then 1 else 0 end) + (case when AirTime is null then 1 else 0 end) + (case when ArrDelay is null then 1 else 0 end) + (case when DepDelay is null then 1 else 0 end) + (case when Origin is null then 1 else 0 end) + (case when Dest is null then 1 else 0 end) + (case when Distance is null then 1 else 0 end) + (case when TaxiIn is null then 1 else 0 end) + (case when TaxiOut is null then 1 else 0 end) + (case when Cancelled is null then 1 else 0 end) + (case when CancellationCode is null then 1 else 0 end) + (case when Diverted is null then 1 else 0 end) + (case when CarrierDelay is null then 1 else 0 end) + (case when WeatherDelay is null then 1 else 0 end) + (case when NASDelay is null then 1 else 0 end) + (case when SecurityDelay is null then 1 else 0 end) + (case when LateAircraftDelay is null then 1 else 0 end) )< 1

The error message is

 java.lang.IllegalArgumentException: Cannot parse [select * from airlineNA where ( (case when Year is null then 1 else 0 end) + (case when Month is null then 1 else 0 end) + (case when DayofMonth is null then 1 else 0 end) + (case when DayOfWeek is null then 1 else 0 end) + (case when DepTime is null then 1 else 0 end) + (case when CRSDepTime is null then 1 else 0 end) + (case when ArrTime is null then 1 else 0 end) + (case when CRSArrTime is null then 1 else 0 end) + (case when UniqueCarrier is null then 1 else 0 end) + (case when FlightNum is null then 1 else 0 end) + (case when TailNum is null then 1 else 0 end) + (case when ActualElapsedTime is null then 1 else 0 end) + (case when CRSElapsedTime is null then 1 else 0 end) + (case when AirTime is null then 1 else 0 end) + (case when ArrDelay is null then 1 else 0 end) + (case when DepDelay is null then 1 else 0 end) + (case when Origin is null then 1 else 0 end) + (case when Dest is null then 1 else 0 end) + (case when Distance is null then 1 else 0 end) + (case when TaxiIn is null then 1 else 0 end) + (case when TaxiOut is null then 1 else 0 end) + (case when Cancelled is null then 1 else 0 end) + (case when CancellationCode is null then 1 else 0 end) + (case when Diverted is null then 1 else 0 end) + (case when CarrierDelay is null then 1 else 0 end) + (case when WeatherDelay is null then 1 else 0 end) + (case when NASDelay is null then 1 else 0 end) + (case when SecurityDelay is null then 1 else 0 end) + (case when LateAircraftDelay is null then 1 else 0 end) )< 1] because `)' expected but `w' found
[info]   at io.ddf.flink.content.SqlSupport$TableDdlParser.parse(SqlSupport.scala:302)
[info]   at io.ddf.flink.etl.SqlHandler.parse(SqlHandler.scala:26)
[info]   at io.ddf.flink.etl.SqlHandler.sql2ddf(SqlHandler.scala:29)
[info]   at io.ddf.flink.etl.SqlHandler.sql2ddf(SqlHandler.scala:154)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.