Comments (17)
Hi @fommil,
As OpenCL is meant to operate on devices that operate with disparate memory addresses, OpenCL treats memory in a 'black box' fashion. Allocating opencl memory does not in fact return a pointer, it returns a handle that is meant to be used in further opencl operations. These handles are cl_mem objects, and our API's take the cl_mem objects as parameters. As you can not apply pointer arithmetic to a handle, we add an extra offset parameter for every cl_mem parameter, to allow a user to specify a starting offset into the buffer.
The extra parameters appended to the BLAS API are the openCL objects that control the execution of the OpenCL kernels. If you set them to NULL, the API will not do anything and I'm sure will appear to run very fast.
We provide library documentation for our API, but it already assumes that you are familiar and comfortable with the OpenCL language. If you would like to start learning OpenCL, the OpenCL specification is not a terrible read, and then AMD has additional resources for developers.
from clblas.
@kknox a little example of how to call the BLAS functions wouldn't go amiss. the equivalent cuBLAS functions are much more closely aligned with the original BLAS API in comparison... although it is rather frustrating that neither library actually implements the BLAS that decades of middleware has conformed to. Hence my wrapper layer.
from clblas.
you don't have an explicit dgemm example, but the C examples you pointed me at were useful.
it looks to me like you're still some way from users being able to call you as BLAS. Ill attempt to wrap DDOT and DGEMM over the coming months, but I'll pause at that point to see where to go.
from clblas.
Hi @fommil
If you are looking for code examples for how to call the BLAS functions, take a look at the samples directory of the repository, we have simple examples of calling almost every routine that we support in single precision. You should be able to compile and step through a sample in a debugger and see what is needed to initialize OpenCL and call into a BLAS routine.
We recognize that the clBLAS API is slightly different than as defined with traditional NetLib BLAS; we did not break the BLAS API lightly or arbitrarily. The concerns for designing for heterogeneous platforms like modern GPU platforms necessitate different decisions than were made 30 years ago for homogeneous platforms like traditional CPU servers. There is a heavy cost in transferring data to and from the heterogeneous device (i.e. the GPU over the PCI express bus) and if data is managed carelessly, the performance will actually be worse than not having offloaded the computation in the first place.
Our API, built on top of OpenCL, allows our clients to manage their own data. They control when and where data is transferred to and from the heterogeneous device. This is the reason that we added the extra OpenCL parameters to the BLAS API's; the user manages the OpenCL state and passes it into the library which ultimately generates OpenCL kernels and enqueues them into the command queue. With this API, the client controls when data is transferred to the device, executes a series of BLAS calls (or user defined kernels) while the data remains on the device and then transfers data back to the Host only when they are done processing. Otherwise, you get in a situation where data is transferred in a round-trip fashion to the device and back on every BLAS call, and then find yourself in the uncomfortable situation where you are better off not having offloaded to the device in the first place 😃
from clblas.
@kknox can you please take a look at this? It's a translation of your sgemm sample.
https://github.com/fommil/netlib-java/blob/master/perf/src/main/c/clwrapper.c
When I run my test file
https://github.com/fommil/netlib-java/blob/master/perf/src/main/c/dgemmtest.c
(compilation instructions at the top)
I see this :-(
found 1 OpenCL platforms
found 1 OpenCL devices
created context
created command queue
setup clblas
created buffers
enqueud buffers
Segmentation fault: 11
I'm on OS X. Note that I changed the CL_DEVICE_TYPE_GPU
as I was getting 0 devices with it. I have another machine that I can try this out on... perhaps my laptop doesn't have GPU OpenCL (first I've heard of it! It's an Intel HD Graphics 3000
).
from clblas.
for completeness, I thought I would note that my Macbook Air doesn't seem to support OpenCL on the GPU :-( http://forums.macrumors.com/showthread.php?t=1119312
from clblas.
Apple will provide OpenCL 1.2 support on the integrated Iris graphics of Haswell-based MBA's in Mavericks when that's released soon. Sounds like you'll be justified in treating yourself to a new laptop! ;-)
http://forums.macrumors.com/showthread.php?t=1620203
http://docs.huihoo.com/apple/wwdc/2013/session_508__working_with_opencl.pdf
Simon
On 11 Sep 2013, at 15:49, Sam Halliday [email protected] wrote:
for completeness, I thought I would note that my Macbook Air doesn't seem to support OpenCL on the GPU :-( http://forums.macrumors.com/showthread.php?t=1119312
—
Reply to this email directly or view it on GitHub.
Head of Microelectronics Group and University of Bristol Business Fellow
High Performance Computing and Architectures, Department of Computer Science
University of Bristol, Merchant Venturers Building, Woodland Road, Clifton, Bristol, BS8 1UB, UK
Phone: +44 (0)117 331 5324, Twitter: simonmcs, Web: http://www.cs.bris.ac.uk/~simonm/
Microelectronics Group webpage: http://www.cs.bris.ac.uk/Research/Micro/
from clblas.
@simonmcs heh, nah... I've got a relatively new iMac that I'll use for GPU performance tests. And clBLAS needs to work without segfaults before I can rationalise a frivolous upgrade :-P
from clblas.
I don't understand what you are trying to do here
size_t off = 1;
size_t offA = K + 1; /* K + off */
size_t offB = N + 1; /* N + off */
size_t offC = N + 1; /* N + off */
To use clBLAS all you need to do is make offsets 0 and pass the other parameters as is. You are making it more complicated than it is worth. The segmentation fault is likely occurring because you are using yoru CPU as your OpenCL device and the wrapper code you have written is trying to access elements that are out of bounds.
from clblas.
@pavanky I am copying the code from the example. I don't understand why the offsets are +1! I thought it was some device specific nonsenses.
from clblas.
The example has the following line.
/* Call clblas extended function. Perform gemm for the lower right sub-matrices */
Since you want matrix multiplication on the entire matrix, try setting offsets to 0 for your case. Use M, N, K, LDA, LDB, LDC directly.
from clblas.
oh, I missed that bit :-D
now, why would a gemm example not do gemm?
from clblas.
@fommil it is doing gemm, but only on the bottom right corner of the buffers.
The equivalent in standard gemm would've used something like A + offA
, B + offB
and C + offC
.
This kind of an API is necessary for OpenCL because such offsets to pointers are not possible from the host side. But such offsets are required for some libraries that are downstream from BLAS (such as various LAPACK implementations).
from clblas.
@pavanky I'm still getting the segfault with no offsets. Actually, this happened last night too and that's why I added all the offsets (I thought it was some hocus pocus and didn't see the note about sub matrices).
from clblas.
I get the segfaults when on a GPU device as well. I won't be able to test this again until next weekend.
from clblas.
@kknox @pavanky I'm still unable to get results with clBLAS but I've been able to run some DGEMM tests with CUDA to confirm your comments about the memory overhead. Indeed, it is pretty spectacular. Turquoise (light blue below the red lines, keep pace with the green ATLAS) is CUDA + overhead, dark blue is CUDA just the dgemm call (and I checked that it is computing the result correctly!)
from clblas.
Closing old clBLAS issues for the new year
I believe that this question has been answered, in part here and in part with the comments in #12.
from clblas.
Related Issues (20)
- test-correctness segfault and "INTERNAL BUILD FAILURE"
- Does this clBLAS support FPGA? HOT 3
- Will it run on OpenCL 1.1 ( EP) on Vivante GC2000 GPU? HOT 3
- Problems building with gtest-1.8.1 HOT 1
- Runtime error with Intel OpenCL 18.1.0.0920
- What is APPML 1.12 and where is it?
- clBLAS test fail with ROCm on Centos 7.6 HOT 1
- clBLAS aborts when backend is OpenCL 1.1
- Outdated documentation? HOT 1
- bug in clblasiCamax???
- Build clBLAS without OpenBLAS? HOT 5
- how about the performance on adreno gpu HOT 1
- test-short failure on gfx1010 (RX5700 XT)
- CMake compilation with clBLAS fails on hard-coded AMDADDPSDK path HOT 1
- Installation procedure went wrong? HOT 1
- add error checker when creating cmd queue in client: especially when OoO queue is not supported on many devices HOT 1
- undefined clblassetup HOT 7
- Test cases that can be displayed in an image interface HOT 1
- Correctness test fails to compile on m2 Mac HOT 1
- Is it a good idea to use GCN cross lane instruction for optimization? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clblas.