deepjavalibrary / d2l-java Goto Github PK

The Java implementation of Dive into Deep Learning (D2L.ai)

License: Other

Shell 0.25% Jupyter Notebook 93.38% Java 6.23% Python 0.14%

machine-learning deep-learning book computer-vision computer-vison natural-language-processing java kaggle data-science mxnet

d2l-java's Issues

Bug on Section 9.7, 10.4, 10.7 of D2L-DJL

Description

There seems to be a bug on the code for these sections which don’t allow me to replicate the results of the loss on the D2L Python book by using DJL. I have tested various methods and have talked to Zach, Frank, and Lai of possible solutions and have tried and implemented all the suggestions without much result...

Environment

I tested my theories and suggestions given in section 10.4 using this (https://github.com/markbookk/java-d2l-IDE/tree/main/section10_4) code. I also set all initializers to ONE as well as in the Python side to eliminate the random factor of Xavier initialization or similar.

I haven’t completed section 10.7 since these 3 sections rely on the same code base for each other. I originally saw this problem when creating section 9.7 but it later replicated to section 10.4 so the problem will still occur on section 10.7.

Problem

The problem occurs when training is performed and we can see that the result of the loss function diverges from the result expected (Python side) as the training continues. I debugged and tried multiple things (which I’ll mention soon) and I noticed the problem comes after calling backward. I checked the NDArrays of the parameters and the gradients of each parameter before calling backward, and comparing to the Python side, they are exactly the same. In addition, I checked the loss function and it is the same (there is a slight difference of something like 0.0001 but only because of how Java float handle less floating points compared to Python). Although the loss sum result is the same, the result of the prediction is 0 so maybe that has something to do with my testing but I tried multiple things and I couldn’t find a way change this result in a way that achieved the same environment on both Python and Java to compare.

What I did to try and solve it

Verified that the Python blocks and Java blocks were the same
Verified that the blocks such as Dense/Linear were passing the same parameters to the engines
Debugged and inspected values on every step to see where arrays and gradients were different
- As mentioned, seems to be after backward
- Achieved this by printing the result of the sum of both NDArray and its gradients of the parameters
Verified that the inputs were the same
Verified the loss functions were the same
Tried setting a random seed and removing Initializer.ONES

Why setting a random seed doesn’t work

I did try setting a random seed and yes, setting the same for Python and Java sides work in “theory”. I generated random NDArrays to test this theory and they were the same. Now the problem occurs when the sequence of this random calls isn’t the same or even when the calls are not exactly the same. As an example, in my code, I leverage the functionality that DJL can automatically initialize your blocks but in the Python side, they manually call encoder.initializer() for example and then call different methods before calling forward, which isn’t the case for DJL side as initialization occurs right before forward is called. This is just an example on my code but I could try and replicate it but that doesn’t fix the problem that comes by MXNet Python and MXNet DJL not having the same exact code or sequence for this random calls.

Possible Solutions

I think setting the random seed every time exactly before initialization may work but not sure... What I mean by this is like setting random.seed(1234) multiple times, to be precise, before every random value is expected.
Set the same “random” arrays and gradients manually for both Python and Java to be able to debug accordingly.
Check all parameters being sent to MXNet engine is the same on both Python and Java side
- I did this but I may have missed something
Debug my code to see if maybe I just missed something in my code

Support better JVM notebooks

TwoSigma release a JVM collection of notebooks: https://github.com/twosigma/beakerx

breakerx offers a rich JVM language based kernel and also plotting utils that can use along.

Build IJava kernel error

Windows 10
Python 3.7
Java 11.0.14
Gradle 4.8.1

When input command " gradlew installKernel " in cmd, I got error as

<-------------> 0% CONFIGURING [5s]
Failed to execute: python3 --version. of :shade
Stdout:

Stderr:

Failed to execute: python3 --version.
Stdout:

Stderr:

I copy D:\Python37\python.exe to D:\Python37\python3.exe, I success built Ijava kernel

FAILURE: Build failed with an exception.

CI constant fail on lenet.ipynb

Error msg:

[NbConvertApp] Converting notebook chapter_convolutional-neural-networks/lenet.ipynb to html
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
[NbConvertApp] Executing notebook with kernel: java
Nov 16, 2020 7:33:16 PM io.github.spencerpark.ijava.JavaKernel formatError
WARNING: 
jdk.jshell.EvalException: MXNet engine call failed: MXNetError: Check failed: ctx.dev_mask() == Context: :kGPU (-1073741696 vs. 2) : 
Stack trace:
  File "src/resource.cc", line 138

Undefined reference to sec_naive_bayes

This is a section in the appendix of the original d2l book but doesn't seem present here.

There are three references to it, but no definition, and the book mentions it as though it's a recent chapter https://github.com/search?q=repo%3Adeepjavalibrary%2Fd2l-java%20sec_naive_bayes&type=code

softmax回归的从零开始实现中的示例错误

3.6.4. 定义损失函数的示例如下：

NDArray yHat = manager.create(new float[][]{{0.1f, 0.3f, 0.6f}, {0.3f, 0.2f, 0.5f}});
yHat.get(new NDIndex(":, {}", manager.create(new int[]{0, 2})));

实际输出：

ND: (2, 2) cpu() float32
[[0.1, 0.6],
 [0.3, 0.5],
]

文中输出：

ND: (2, 1) gpu(0) float32
[[0.1],
 [0.5],
]

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

deepjavalibrary / d2l-java Goto Github PK

d2l-java's Issues

Bug on Section 9.7, 10.4, 10.7 of D2L-DJL

Support better JVM notebooks

Build IJava kernel error

CI constant fail on lenet.ipynb

Undefined reference to sec_naive_bayes

softmax回归的从零开始实现中的示例错误

A fix needed in the website

Underfit Overfit notebook result incorrect

Incorrect figure for Loss Function in d2l-java/chapter_linear-networks /linear-regression.ipynb

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent