Giter VIP home page Giter VIP logo

Comments (11)

blueberry avatar blueberry commented on May 20, 2024

yes. can you please create a midje test that demonstrates the error?

from bayadera.

gurvesh avatar gurvesh commented on May 20, 2024

Sure. Here you go:

`
(ns testing-uniform
(:use (clojure repl))
(:require [uncomplicate.neanderthal
[core :refer :all]
[native :refer :all]]
[midje.sweet :refer :all]
[uncomplicate.bayadera
[core :refer [sampler dataset density distribution evidence sample! sample]]
[library :as nl]
[cuda :refer [with-default-bayadera]]]
[uncomplicate.commons.core :refer [with-release]]))

(defn check-sample [n]
(facts
(with-default-bayadera
(with-release [uniform-dist (nl/uniform 1 2)
test-sampler (sampler uniform-dist)
test-sample (sample test-sampler n)]

   ((native (row test-sample 0))) => n ))))

`

Note that:

  1. The above will pass for small values of n (on my computer, n <= 1,000,000 works fine) - but anything higher will kill it (and once the error is triggered, any lower of value of n will still keep triggering it - you need to restart lein).

Exception thrown: clojure.lang.ExceptionInfo (CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS.)

error - uncomplicate.clojurecuda.internal.utils - (utils.clj:37)
fn - uncomplicate.clojurecuda.internal.impl.CUModule - (impl.clj:91)
release - uncomplicate.clojurecuda.internal.impl.CUModule - (impl.clj:91)
release - uncomplicate.bayadera.internal.device.nvidia-gtx.GTXDatasetEngine - (nvidia_gtx.clj:155)
release - uncomplicate.bayadera.internal.device.nvidia-gtx.GTXBayaderaFactory - (nvidia_gtx.clj:763)
check-sample/fn/fn - testing-uniform - (form-init10386720772629001284.clj:14)
with-altered-roots* - midje.util.thread-safe-var-nesting - (thread_safe_var_nesting.clj:32)
check-sample/fn - testing-uniform - (form-init10386720772629001284.clj:14)
applyToHelper - (AFn.java:152)
applyTo - (AFn.java:144)
doInvoke - (AFunction.java:31)
invoke - (RestFn.java:397)
check-one/fn - midje.checking.facts - (facts.clj:32)
check-one - midje.checking.facts - (facts.clj:31)
creation-time-check - midje.checking.facts - (facts.clj:35)
check-sample - testing-uniform - (form-init10386720772629001284.clj:13)
eval60767 - testing-uniform - (form-init10386720772629001284.clj:1)
eval60767 - testing-uniform - (form-init10386720772629001284.clj:1)
eval - (Compiler.java:7176)
eval - (Compiler.java:7131)
eval - clojure.core - (core.clj:3214)
eval - clojure.core - (core.clj:3210)
repl/read-eval-print/fn - clojure.main - (main.clj:414)
repl/read-eval-print - clojure.main - (main.clj:414)
repl/fn - clojure.main - (main.clj:435)
repl - clojure.main - (main.clj:435)
evaluate - nrepl.middleware.interruptible-eval - (interruptible_eval.clj:106)
interruptible-eval/fn/fn - nrepl.middleware.interruptible-eval - (interruptible_eval.clj:140)
run - (AFn.java:22)
session-exec/main-loop/fn - nrepl.middleware.session - (session.clj:171)
session-exec/main-loop - nrepl.middleware.session - (session.clj:170)
run - (AFn.java:22)
run - (Thread.java:748)
  1. If you replace nl/uniform with nl/beta or any other distribution that doesn't use the GTXDirectSampler, it will work for values of n upto around 400,000,000 (provided the code in the pull request I created is implemented first - otherwise it will fail for other reasons). After that it gives another error of the form:
    Actual:
    java.lang.IllegalArgumentException: Value out of range for int: 4000000000

Hope this helps!

from bayadera.

blueberry avatar blueberry commented on May 20, 2024

OK. Let me finish a book chapter (1-2 days) and then I'll take a look at this and your pull request, and probably update bayadera to the latest neanderthal, too.

from bayadera.

gurvesh avatar gurvesh commented on May 20, 2024

Look forward to it!

from bayadera.

blueberry avatar blueberry commented on May 20, 2024

I'm working on this. Currently, there are many updates that I need to add to bayadera to make it keep up to the latest neanderthal.

Particularly, switching my AMD GPU from 290X to Vega64 a couple months ago required switching the drivers from old proprietary catalyst to the newest open-source ROCm raised many OpenCL compatibility breakages (their new OpenCL C compiler does not swallow many things that were allowed by catalyst).

This error might be caused by many changes that I made to bayadera in this migration process months ago. Now I'll see to make a thorough pass and update everything. I hope that it will reveal what makes this particular error. Anyway, this might take more than a few afternoons, but expect updates in a few days.

from bayadera.

gurvesh avatar gurvesh commented on May 20, 2024

Ah OK. That would be great (the "upgrade everything" part). But I'll just say that my error was on Cuda, not OpenCL. And if you look at the GTXDirectSamplerEngine code in the uncomplicate.bayadera.internal.device.nvidia_gtx.clj, it just seems like an older version of the code (lacks with-release etc), while the OpenCL version in the Amd_gcn.clj file (GCNDirectSamplerEngine) has a lot more idiomatic code - as in, similar to the rest of of the library. But yes, getting everything to work with the latest Neanderthal would be awesome!

Btw - any plans of making v1.0.0 of everything anytime soon? :)

from bayadera.

blueberry avatar blueberry commented on May 20, 2024

The CUDA/OpenCL discrepancy is not due to and older version of code, but by constraints on what is possible with cuda and OpenCL. That is exactly why I need to tidy everything up first and then fix this error. Otherwise, CUDA and OpenCL implementation would diverge, which would make them much harder to properly maintain. I am already using some advanced code there, and the behavior is highly dependent on OpenCL drivers, that I regularly do not have anywhere to look up for a solution, and am left to experimenting. In fact, this migration to ROCm caused a block where some AMD code won't run as expected even with fixes that run in other places, and there are zero results for the error and the wierd behavior that I get...

from bayadera.

blueberry avatar blueberry commented on May 20, 2024

Update: I've solved other showstoppers, so this issue will be next to look at (but not today).

from bayadera.

blueberry avatar blueberry commented on May 20, 2024

About this particular issue: there is a bug in the GTXSamplerEngines that launches too many threads. Instead of (dim res), the number of threads should be 4 times less (let's assume that you use power of two size increments, so this does not support dim=1234567). So, instead of:

(do (launch! sample-kernel (grid-1d (dim res) WGS) hstream

(do (launch! sample-kernel (grid-1d (/ (dim res) 4) WGS) hstream

As I decided to change direct sampler substantially, the update will not come that soon, so please make this change in your own copy of the project.

I am not sure why this error does not manifest with smaller arrays. It should fail with any size, but it doesn't.

from bayadera.

gurvesh avatar gurvesh commented on May 20, 2024

Thanks a lot for your time, and it works!

from bayadera.

blueberry avatar blueberry commented on May 20, 2024

Fixed in latest commits.

from bayadera.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.