DDF with Flink
This project depends on DDF and uses Apache Flink engine.
DDF
Distributed DataFrame: Productivity = Power x Simplicity For Big Data Scientists & Engineers
Getting Started
This project depends on DDF v1.4.0-SNAPSHOT and requires its installation to run. To get DDF version 1.4.0-SNAPSHOT, clone DDF repo and checkout the tuplejump-integration branch.
$ git clone [email protected]:ddf-project/DDF.git
$ cd DDF
$ git fetch
$ git checkout tuplejump-integration
No changes are required when installing DDF using maven.
Before installing DDF using SBT, add a new line after line#482 in project/RootBuild.scala, (don't miss adding the comma at the end of line#482)
),
publishArtifact in (Compile, packageDoc) := false
This is to avoid the error in publishing docs through SBT.
DDF can be installed by,
$ bin/run-once.sh
//using maven
$ mvn package install -DskipTests
//or using sbt
$ sbt publishLocal
Installing ddf-with-flink
can be done by
$ git clone [email protected]:tuplejump/ddf-with-flink.git
$ cd ddf-with-flink
$ bin/run-once.sh
$ mvn package install -DskipTests
Running tests
Tests can be run either through SBT or Maven,
$ sbt test
$ mvn test
//running a single test
$ sbt "testOnly *FlinkDDFManagerSpec*"
$ mvn test -Dsuites='io.ddf.flink.FlinkDDFManagerSpec'
ddf-shell
with flink
engine
Starting Execute the following only after installing ddf-with-flink
$ sbt package
$ bin/ddf-shell
SBT package is required since it generates the lib_managed
which is required for running the scripts.
Running the example,
$ sbt package
$ bin/run-flink-example io.ddf.flink.examples.FlinkDDFExample
SBT package is required since it generates the lib_managed
which is required for running the scripts.
####Todo
- Test the ML method
getConfusionMatrix
- Implement
transformPython
andflattenDDF
for TransformationHandler and also test the R functions. - Implement the methods
r2score
,residuals
,roc
andrmse
for MLMetricsSupporter