deepjavalibrary / d2l-java Goto Github PK
View Code? Open in Web Editor NEWThe Java implementation of Dive into Deep Learning (D2L.ai)
Home Page: https://d2l.djl.ai
License: Other
The Java implementation of Dive into Deep Learning (D2L.ai)
Home Page: https://d2l.djl.ai
License: Other
Description
There seems to be a bug on the code for these sections which don’t allow me to replicate the results of the loss on the D2L Python book by using DJL. I have tested various methods and have talked to Zach, Frank, and Lai of possible solutions and have tried and implemented all the suggestions without much result...
Environment
I tested my theories and suggestions given in section 10.4 using this (https://github.com/markbookk/java-d2l-IDE/tree/main/section10_4) code. I also set all initializers to ONE as well as in the Python side to eliminate the random factor of Xavier initialization or similar.
I haven’t completed section 10.7 since these 3 sections rely on the same code base for each other. I originally saw this problem when creating section 9.7 but it later replicated to section 10.4 so the problem will still occur on section 10.7.
Problem
The problem occurs when training is performed and we can see that the result of the loss function diverges from the result expected (Python side) as the training continues. I debugged and tried multiple things (which I’ll mention soon) and I noticed the problem comes after calling backward. I checked the NDArrays of the parameters and the gradients of each parameter before calling backward, and comparing to the Python side, they are exactly the same. In addition, I checked the loss function and it is the same (there is a slight difference of something like 0.0001 but only because of how Java float handle less floating points compared to Python). Although the loss sum result is the same, the result of the prediction is 0 so maybe that has something to do with my testing but I tried multiple things and I couldn’t find a way change this result in a way that achieved the same environment on both Python and Java to compare.
What I did to try and solve it
Why setting a random seed doesn’t work
I did try setting a random seed and yes, setting the same for Python and Java sides work in “theory”. I generated random NDArrays to test this theory and they were the same. Now the problem occurs when the sequence of this random calls isn’t the same or even when the calls are not exactly the same. As an example, in my code, I leverage the functionality that DJL can automatically initialize your blocks but in the Python side, they manually call encoder.initializer() for example and then call different methods before calling forward, which isn’t the case for DJL side as initialization occurs right before forward is called. This is just an example on my code but I could try and replicate it but that doesn’t fix the problem that comes by MXNet Python and MXNet DJL not having the same exact code or sequence for this random calls.
Possible Solutions
TwoSigma release a JVM collection of notebooks: https://github.com/twosigma/beakerx
breakerx offers a rich JVM language based kernel and also plotting utils that can use along.
Windows 10
Python 3.7
Java 11.0.14
Gradle 4.8.1
When input command " gradlew installKernel " in cmd, I got error as
<-------------> 0% CONFIGURING [5s]
Failed to execute: python3 --version. of :shade
Stdout:Stderr:
Failed to execute: python3 --version.
Stdout:Stderr:
I copy D:\Python37\python.exe to D:\Python37\python3.exe, I success built Ijava kernel
FAILURE: Build failed with an exception.
Error msg:
[NbConvertApp] Converting notebook chapter_convolutional-neural-networks/lenet.ipynb to html
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop starting...
Nov 16, 2020 7:31:39 PM io.github.spencerpark.jupyter.channels.Loop start
INFO: Loop started.
[NbConvertApp] Executing notebook with kernel: java
Nov 16, 2020 7:33:16 PM io.github.spencerpark.ijava.JavaKernel formatError
WARNING:
jdk.jshell.EvalException: MXNet engine call failed: MXNetError: Check failed: ctx.dev_mask() == Context: :kGPU (-1073741696 vs. 2) :
Stack trace:
File "src/resource.cc", line 138
This is a section in the appendix of the original d2l book but doesn't seem present here.
There are three references to it, but no definition, and the book mentions it as though it's a recent chapter https://github.com/search?q=repo%3Adeepjavalibrary%2Fd2l-java%20sec_naive_bayes&type=code
3.6.4. 定义损失函数 的示例如下:
NDArray yHat = manager.create(new float[][]{{0.1f, 0.3f, 0.6f}, {0.3f, 0.2f, 0.5f}});
yHat.get(new NDIndex(":, {}", manager.create(new int[]{0, 2})));
实际输出:
ND: (2, 2) cpu() float32
[[0.1, 0.6],
[0.3, 0.5],
]
文中输出:
ND: (2, 1) gpu(0) float32
[[0.1],
[0.5],
]
The result plot on the graph are not converging
Incorrect figure in
https://github.com/deepjavalibrary/d2l-java/blob/master/chapter_linear-networks/linear-regression.ipynb
for Loss Function
https://resources.djl.ai/d2l-java/Neuron.svg
In the original book Dive into Deep Learning the figure is:
https://d2l.ai/_images/fit-linreg.svg
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.