deepjavalibrary / d2l-java Goto Github PK

The Java implementation of Dive into Deep Learning (D2L.ai)

License: Other

Shell 0.25% Jupyter Notebook 93.38% Java 6.23% Python 0.14%

machine-learning deep-learning book computer-vision computer-vison natural-language-processing java kaggle data-science mxnet tensorflow pytorch djl nlp d2l jupyter-notebook

d2l-java's Introduction

Dive into Deep Learning (Java version)

This project is modified from the original Dive Into Deep Learning book by Aston Zhang, Zachary C. Lipton, Mu Li, Alex J. Smola and all the community contributors. GitHub of the original book: https://github.com/d2l-ai/d2l-en. We have adapted the book to to use Java and the Deep Java Library(DJL).

All the notebook here can be downloaded and run using Java Kernel. We also compiled the book into a website.

This project is currently being developed and maintained by AWS and the DJL community.

How to run Jupyter Notebook in Java

Online

You can run online by clicking:

Or Colab:

Local

Please follow the instruction here for how to run notebook using Java kernel.

How to contribute to this book

Please follow the contributor guide here

We have the following chapters implemented

About Deep Java Library

Deep Java Library (DJL) is a Deep Learning Framework written in Java, supporting both training and inference. DJL is built on top of modern Deep Learning frameworks (TenserFlow, PyTorch, MXNet, etc). You can easily use DJL to train your model or deploy your favorite models from a variety of engines without any additional conversion. It contains a powerful ModelZoo design that allows you to manage trained models and load them in a single line. The built-in ModelZoo currently supports more than 70 pre-trained and ready to use models from GluonCV, HuggingFace, TorchHub and Keras.

Follow our GitHub, demo repository, Slack channel and twitter for more documentation and examples of DJL!

d2l-java's People

Contributors

Stargazers

Watchers

d2l-java's Issues

A fix needed in the website

In section 2.6, there us a formula that was not parsed correctly.

softmax回归的从零开始实现中的示例错误

3.6.4. 定义损失函数的示例如下：

NDArray yHat = manager.create(new float[][]{{0.1f, 0.3f, 0.6f}, {0.3f, 0.2f, 0.5f}});
yHat.get(new NDIndex(":, {}", manager.create(new int[]{0, 2})));

实际输出：

ND: (2, 2) cpu() float32
[[0.1, 0.6],
 [0.3, 0.5],
]

文中输出：

ND: (2, 1) gpu(0) float32
[[0.1],
 [0.5],
]

Support better JVM notebooks

TwoSigma release a JVM collection of notebooks: https://github.com/twosigma/beakerx

breakerx offers a rich JVM language based kernel and also plotting utils that can use along.

Incorrect figure for Loss Function in d2l-java/chapter_linear-networks /linear-regression.ipynb

Incorrect figure in
https://github.com/deepjavalibrary/d2l-java/blob/master/chapter_linear-networks/linear-regression.ipynb
for Loss Function
https://resources.djl.ai/d2l-java/Neuron.svg

In the original book Dive into Deep Learning the figure is:
https://d2l.ai/_images/fit-linreg.svg

CI constant fail on lenet.ipynb

Error msg:

[NbConvertApp] Converting notebook chapter_convolutional-neural-networks/lenet.ipynb to html
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
[NbConvertApp] Executing notebook with kernel: java
Nov 16, 2020 7:33:16 PM io.github.spencerpark.ijava.JavaKernel formatError
WARNING: 
jdk.jshell.EvalException: MXNet engine call failed: MXNetError: Check failed: ctx.dev_mask() == Context: :kGPU (-1073741696 vs. 2) : 
Stack trace:
  File "src/resource.cc", line 138

Underfit Overfit notebook result incorrect

The result plot on the graph are not converging

Bug on Section 9.7, 10.4, 10.7 of D2L-DJL

Description

There seems to be a bug on the code for these sections which don’t allow me to replicate the results of the loss on the D2L Python book by using DJL. I have tested various methods and have talked to Zach, Frank, and Lai of possible solutions and have tried and implemented all the suggestions without much result...

Environment

I tested my theories and suggestions given in section 10.4 using this (https://github.com/markbookk/java-d2l-IDE/tree/main/section10_4) code. I also set all initializers to ONE as well as in the Python side to eliminate the random factor of Xavier initialization or similar.

I haven’t completed section 10.7 since these 3 sections rely on the same code base for each other. I originally saw this problem when creating section 9.7 but it later replicated to section 10.4 so the problem will still occur on section 10.7.

Problem

The problem occurs when training is performed and we can see that the result of the loss function diverges from the result expected (Python side) as the training continues. I debugged and tried multiple things (which I’ll mention soon) and I noticed the problem comes after calling backward. I checked the NDArrays of the parameters and the gradients of each parameter before calling backward, and comparing to the Python side, they are exactly the same. In addition, I checked the loss function and it is the same (there is a slight difference of something like 0.0001 but only because of how Java float handle less floating points compared to Python). Although the loss sum result is the same, the result of the prediction is 0 so maybe that has something to do with my testing but I tried multiple things and I couldn’t find a way change this result in a way that achieved the same environment on both Python and Java to compare.

What I did to try and solve it

Verified that the Python blocks and Java blocks were the same
Verified that the blocks such as Dense/Linear were passing the same parameters to the engines
Debugged and inspected values on every step to see where arrays and gradients were different
- As mentioned, seems to be after backward
- Achieved this by printing the result of the sum of both NDArray and its gradients of the parameters
Verified that the inputs were the same
Verified the loss functions were the same
Tried setting a random seed and removing Initializer.ONES

Why setting a random seed doesn’t work

I did try setting a random seed and yes, setting the same for Python and Java sides work in “theory”. I generated random NDArrays to test this theory and they were the same. Now the problem occurs when the sequence of this random calls isn’t the same or even when the calls are not exactly the same. As an example, in my code, I leverage the functionality that DJL can automatically initialize your blocks but in the Python side, they manually call encoder.initializer() for example and then call different methods before calling forward, which isn’t the case for DJL side as initialization occurs right before forward is called. This is just an example on my code but I could try and replicate it but that doesn’t fix the problem that comes by MXNet Python and MXNet DJL not having the same exact code or sequence for this random calls.

Possible Solutions

I think setting the random seed every time exactly before initialization may work but not sure... What I mean by this is like setting random.seed(1234) multiple times, to be precise, before every random value is expected.
Set the same “random” arrays and gradients manually for both Python and Java to be able to debug accordingly.
Check all parameters being sent to MXNet engine is the same on both Python and Java side
- I did this but I may have missed something
Debug my code to see if maybe I just missed something in my code

Build IJava kernel error

Windows 10
Python 3.7
Java 11.0.14
Gradle 4.8.1

When input command " gradlew installKernel " in cmd, I got error as

<-------------> 0% CONFIGURING [5s]
Failed to execute: python3 --version. of :shade
Stdout:

Stderr:

Failed to execute: python3 --version.
Stdout:

Stderr:

I copy D:\Python37\python.exe to D:\Python37\python3.exe, I success built Ijava kernel

FAILURE: Build failed with an exception.