Giter VIP home page Giter VIP logo

jnb2docker's Introduction

jnb2docker

Converts Java Jupyter notebooks (using the IJava kernel) into Docker images.

Coding conventions

Under the hood, JShell is being used to execute the code from the notebook. However, JShell requires a certain coding style for it to work, not just any Java code that can be compiled with javac. Statements that normally don't require surrounding in curly brackets need to be coded with such, otherwise jshell won't know that there is more code to come.

This code works:

if (condition) {
  dosomething;
} else {
  dosomethingelse;
}

This does not:

if (condition)
  dosomething;
else
  dosomethingelse;

This one does not work either:

if (condition) {
  dosomething;
} 
else {
  dosomethingelse;
}

In order to extract dependencies, you can use the following line magics in your Notebook:

  • %maven ... -- for specifying a single maven dependency, e.g.:

    %maven nz.ac.waikato.cms.weka:weka-dev:3.9.4
    
  • %jars ... -- for specifying external jars, e.g. a single one:

    %jars /some/where/multisearch-weka-package-2020.2.17.jar
    

    Or all jars in a directory:

    %jars C:/some/where/*.jar
    

Command-line

Converts Java Jupyter notebooks into Docker images.


Usage: [--help] [-m MAVEN_HOME] [-u MAVEN_USER_SETTINGS]
       [-j JAVA_HOME] [-v JVM...] -i INPUT
       -b DOCKER_BASE_IMAGE [-I DOCKER_INSTRUCTIONS]
       -o OUTPUT_DIR

Options:
-m, --maven_home MAVEN_HOME
	The directory with a local Maven installation to use instead of the
	bundled one.

-u, --maven_user_settings MAVEN_USER_SETTINGS
	The file with the maven user settings to use other than
	$HOME/.m2/settings.xml.

-j, --java_home JAVA_HOME
	The Java home to use for the Maven execution.

-v, --jvm JVM
	The parameters to pass to the JVM before launching the application.

-i, --input INPUT
	The Java Jupyter notebook to convert.

-b, --docker_base_image DOCKER_BASE_IMAGE
	The docker base image to use, e.g. 'openjdk:11-jdk-slim-buster'.

-I, --docker_instructions DOCKER_INSTRUCTIONS
	File with additional docker instructions to use for generating the
	Dockerfile.

-o, --output_dir OUTPUT_DIR
	The directory to output the bootstrapped application, JShell script and
	Dockerfile in.

Example

For this example we use the weka_filter_pipeline.ipynb notebook and the additional weka_filter_pipeline.dockerfile Docker instructions. This notebook contains a simple Weka filter setup, using the InterquartileRange filter to remove outliers and extreme values from an input file and saving the cleaned dataset as a new file.

The command-lines for this example assume this directory structure:

/some/where
|
+- data
|  |
|  +- jnb2docker   // contains the jar
|  |
|  +- notebooks
|  |  |
|  |  +- weka_filter_pipeline.ipynb       // actual notebook
|  |  |
|  |  +- weka_filter_pipeline.dockerfile  // additional Dockerfile instructions
|  |
|  +- in
|  |  |
|  |  +- bolts.arff   // raw dataset to filter
|  |
|  +- out
|
+- output
|  |
|  +- wekaiqrcleaner  // will contain all the generated data, including "Dockerfile"

For our Dockerfile, we use the openjdk:11-jdk-slim-buster base image (-b), which contains an OpenJDK 11 installation on top of a Debian "buster" image. The weka_filter_pipeline.ipynb notebook (-i) then gets turned into code for JShell using the following command-line:

java -jar /some/where/data/jnb2docker/jnb2docker-0.0.3-spring-boot.jar \
  -i /some/where/data/notebooks/weka_filter_pipeline.ipynb \ 
  -o /some/where/output/wekaiqrcleaner \
  -b openjdk:11-jdk-slim-buster \
  -I /some/where/data/notebooks/weka_filter_pipeline.dockerfile  

Now we build the docker image called wekaiqrcleaner from the Dockerfile that has been generated in the output directory /some/where/output/wekaiqrcleaner (-o option in previous command-line):

cd /some/where/output/wekaiqrcleaner
sudo docker build -t wekaiqrcleaner .

With the image built, we can now push the raw ARFF file through for cleaning. For this to work, we map the in/out directories from our directory structure into the Docker container (using the -v option) and we supply the input and output files via the INPUT and OUTPUT environment variables (using the -e option). In order to see a few more messages, we also turn on the debugging output that is part of the notebook, using the VERBOSE environment variable:

sudo docker run -ti \
  -v /some/where/data/in:/data/in \
  -v /some/where/data/out:/data/out \
  -e INPUT=/data/in/bolts.arff \
  -e OUTPUT=/data/out/bolts-clean.arff \
  -e VERBOSE=true \
  wekaiqrcleaner

From the debugging messages you can see that the initial dataset with 40 rows of data gets reduced to 36 rows.

Disclaimer: This is just a simple notebook tailored to the UCI dataset bolts.arff.

Releases

Maven

    <dependency>
      <groupId>com.github.fracpete</groupId>
      <artifactId>jnb2docker</artifactId>
      <version>0.0.5</version>
    </dependency>

jnb2docker's People

Contributors

fracpete avatar dependabot[bot] avatar abifet avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.