Comments (11)
yes. can you please create a midje test that demonstrates the error?
from bayadera.
Sure. Here you go:
`
(ns testing-uniform
(:use (clojure repl))
(:require [uncomplicate.neanderthal
[core :refer :all]
[native :refer :all]]
[midje.sweet :refer :all]
[uncomplicate.bayadera
[core :refer [sampler dataset density distribution evidence sample! sample]]
[library :as nl]
[cuda :refer [with-default-bayadera]]]
[uncomplicate.commons.core :refer [with-release]]))
(defn check-sample [n]
(facts
(with-default-bayadera
(with-release [uniform-dist (nl/uniform 1 2)
test-sampler (sampler uniform-dist)
test-sample (sample test-sampler n)]
((native (row test-sample 0))) => n ))))
`
Note that:
- The above will pass for small values of n (on my computer, n <= 1,000,000 works fine) - but anything higher will kill it (and once the error is triggered, any lower of value of n will still keep triggering it - you need to restart lein).
Exception thrown: clojure.lang.ExceptionInfo (CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS.)
error - uncomplicate.clojurecuda.internal.utils - (utils.clj:37)
fn - uncomplicate.clojurecuda.internal.impl.CUModule - (impl.clj:91)
release - uncomplicate.clojurecuda.internal.impl.CUModule - (impl.clj:91)
release - uncomplicate.bayadera.internal.device.nvidia-gtx.GTXDatasetEngine - (nvidia_gtx.clj:155)
release - uncomplicate.bayadera.internal.device.nvidia-gtx.GTXBayaderaFactory - (nvidia_gtx.clj:763)
check-sample/fn/fn - testing-uniform - (form-init10386720772629001284.clj:14)
with-altered-roots* - midje.util.thread-safe-var-nesting - (thread_safe_var_nesting.clj:32)
check-sample/fn - testing-uniform - (form-init10386720772629001284.clj:14)
applyToHelper - (AFn.java:152)
applyTo - (AFn.java:144)
doInvoke - (AFunction.java:31)
invoke - (RestFn.java:397)
check-one/fn - midje.checking.facts - (facts.clj:32)
check-one - midje.checking.facts - (facts.clj:31)
creation-time-check - midje.checking.facts - (facts.clj:35)
check-sample - testing-uniform - (form-init10386720772629001284.clj:13)
eval60767 - testing-uniform - (form-init10386720772629001284.clj:1)
eval60767 - testing-uniform - (form-init10386720772629001284.clj:1)
eval - (Compiler.java:7176)
eval - (Compiler.java:7131)
eval - clojure.core - (core.clj:3214)
eval - clojure.core - (core.clj:3210)
repl/read-eval-print/fn - clojure.main - (main.clj:414)
repl/read-eval-print - clojure.main - (main.clj:414)
repl/fn - clojure.main - (main.clj:435)
repl - clojure.main - (main.clj:435)
evaluate - nrepl.middleware.interruptible-eval - (interruptible_eval.clj:106)
interruptible-eval/fn/fn - nrepl.middleware.interruptible-eval - (interruptible_eval.clj:140)
run - (AFn.java:22)
session-exec/main-loop/fn - nrepl.middleware.session - (session.clj:171)
session-exec/main-loop - nrepl.middleware.session - (session.clj:170)
run - (AFn.java:22)
run - (Thread.java:748)
- If you replace nl/uniform with nl/beta or any other distribution that doesn't use the GTXDirectSampler, it will work for values of n upto around 400,000,000 (provided the code in the pull request I created is implemented first - otherwise it will fail for other reasons). After that it gives another error of the form:
Actual:
java.lang.IllegalArgumentException: Value out of range for int: 4000000000
Hope this helps!
from bayadera.
OK. Let me finish a book chapter (1-2 days) and then I'll take a look at this and your pull request, and probably update bayadera to the latest neanderthal, too.
from bayadera.
Look forward to it!
from bayadera.
I'm working on this. Currently, there are many updates that I need to add to bayadera to make it keep up to the latest neanderthal.
Particularly, switching my AMD GPU from 290X to Vega64 a couple months ago required switching the drivers from old proprietary catalyst to the newest open-source ROCm raised many OpenCL compatibility breakages (their new OpenCL C compiler does not swallow many things that were allowed by catalyst).
This error might be caused by many changes that I made to bayadera in this migration process months ago. Now I'll see to make a thorough pass and update everything. I hope that it will reveal what makes this particular error. Anyway, this might take more than a few afternoons, but expect updates in a few days.
from bayadera.
Ah OK. That would be great (the "upgrade everything" part). But I'll just say that my error was on Cuda, not OpenCL. And if you look at the GTXDirectSamplerEngine code in the uncomplicate.bayadera.internal.device.nvidia_gtx.clj, it just seems like an older version of the code (lacks with-release etc), while the OpenCL version in the Amd_gcn.clj file (GCNDirectSamplerEngine) has a lot more idiomatic code - as in, similar to the rest of of the library. But yes, getting everything to work with the latest Neanderthal would be awesome!
Btw - any plans of making v1.0.0 of everything anytime soon? :)
from bayadera.
The CUDA/OpenCL discrepancy is not due to and older version of code, but by constraints on what is possible with cuda and OpenCL. That is exactly why I need to tidy everything up first and then fix this error. Otherwise, CUDA and OpenCL implementation would diverge, which would make them much harder to properly maintain. I am already using some advanced code there, and the behavior is highly dependent on OpenCL drivers, that I regularly do not have anywhere to look up for a solution, and am left to experimenting. In fact, this migration to ROCm caused a block where some AMD code won't run as expected even with fixes that run in other places, and there are zero results for the error and the wierd behavior that I get...
from bayadera.
Update: I've solved other showstoppers, so this issue will be next to look at (but not today).
from bayadera.
About this particular issue: there is a bug in the GTXSamplerEngines that launches too many threads. Instead of (dim res), the number of threads should be 4 times less (let's assume that you use power of two size increments, so this does not support dim=1234567). So, instead of:
(do (launch! sample-kernel (grid-1d (/ (dim res) 4) WGS) hstream
As I decided to change direct sampler substantially, the update will not come that soon, so please make this change in your own copy of the project.
I am not sure why this error does not manifest with smaller arrays. It should fail with any size, but it doesn't.
from bayadera.
Thanks a lot for your time, and it works!
from bayadera.
Fixed in latest commits.
from bayadera.
Related Issues (8)
- [RFC] SYCL backend for BAYADERA HOT 6
- Hi , we are trying to run lein test and are getting this errors , can you help us ? HOT 7
- Could not find artifact uncomplicate:neanderthal:jar:0.10.0-SNAPSHOT HOT 2
- CL_BUILD_PROGRAM_FAILURE HOT 6
- documentation HOT 1
- Dedicated discussion server
- Website is unreachable HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bayadera.