openhackathons-org / nways_accelerated_programming Goto Github PK
View Code? Open in Web Editor NEWN-Ways to GPU Programming Bootcamp
License: Apache License 2.0
N-Ways to GPU Programming Bootcamp
License: Apache License 2.0
Cupy:
Example 4 - typo. Should be 'output may vary'.
Regarding RDF calculation, you have to scan all the pairs of atoms there, but as it's implemented it:
Regarding the challenge, the OpenACC/C solution has #pragma acc parallel loop collapse(2) default(present)
for (i = 1; i <= m; i++)
{
for (j = 1; j <= n; j++)
{
tmp = newarr[i * (m + 2) + j] - oldarr[i * (m + 2) + j];
dsq += tmp * tmp;
}
}
Shouldn't dsq be declared as reduction(+:dsq)?
Challenge lab - The instructions for running the code is not clear, the block for running the serial code only works before you start editing the code - there is confusion when trying to get the serial run timing.
Text in notebooks should be updated concerning qdrep (nsys-rep). Text and execution cells should be updated concerning -ta for opanacc compilation.
The HPC SDK is old and should be updated to the latest version (NVIDIA HPC SDK Version 24.3)
Downloading profiling report and viewing on the installed nsight system GUI kind of disconnect participants from the notebooks. It is possible to install to run nsight system within the Jupyter Notebook as shown here: link
In ISO notebook, section "Compile for TESLA GPU (ISO C++)" the cell only compiles the code but doesn’t run it - whilst the lab does say "Make sure to validate the output by running the executable", I think it would be easier if that was just part of the command in the cell, similar to the earlier cells in the notebook.
one of the solutions (OpenACC, C_++) collapse_rdf has a "#pragma acc" on the inner loop which shouldn't be there - it actually won't compile with this in it.
The OpenACC notebook mentions avoiding multiple loops in a parallel region without context - would be nice to have a "why" associated with this.
CuPy Fundamentals:
Typo in Data transfer: d_X = cp.asarray(x) should be cp.asarray(h_X).
Cupy: Exercise 4: Is the expected output correct? This does not match the output of my solutions to Exercises 3 and 4.
Cannot build nways_Docker_python as nvidia/cuda:11.4.2-devel-ubuntu20.04` is no longer available from docker.io:
docker pull nvidia/cuda:11.4.2-devel-ubuntu20.04
Error response from daemon: manifest for nvidia/cuda:11.4.2-devel-ubuntu20.04 not found: manifest unknown: manifest unknown
The line cd ../../python/source_code/serial&& nsys profile --stats=true --force-overwrite true -o serial_cpu_rdf python3 nways_serial_overview.py
does not find the input properly.
Cupy:
Example 6, Step 3 - typo. Should be 'set reduction expression a + b' or 'set reduction expression for a and b'. The ampersand (&) is an operator in Python, so it is confusing to use that here to denote 'and'.
Cupy:
Example 5 - typo. Capitalize "step 5" -> Step 5.
Following are a list of suggested changes to the Python Nways materials as suggested by Robert Searles and Jonathan Dursi
JIT kernels
• Can we move this before CUDA kernels?
• Maybe add Numba Vectorize as an introduction? the following flow: Vectorize -> JIT -> CuPy CUDA makes more sense than CuPy CUDA -> JIT
• In fact, is the order of cupy then numba the right way to go? Can we flip those sections?
Numba notebook:
Exercise 1
• Again, exercise is too easy; students will just copy and paste. Could we make them change it to float, and multiply? Or some slightly deeper change?
Thread re-use - this comes out of nowhere
Matrix multiply:
• Same idea, could we do a naïve matrix transpose instead?
Numba vectorize/ufuncs
• This seems out of place. It doesn't make sense to me to have this come before Numba CUDA kernels and interrupting the flow between numba cuda kernels and atomics
Atomic
• It would be nice if the atomic example for a reduction built on an earlier example, say calculating average matrix element after the multiplication or something
Create the replica of the current Nways to GPU programming content with CFD example using miniweather example. Nways content is available on Github(https://github.com/openhackathons-org/nways_accelerated_programming) . Miniweather example is available on Github(https://github.com/openhackathons-org/gpubootcamp/tree/master/hpc/miniprofiler) To complement the existing content, contribute to one of the below:
To extend the current Nways content, create a version using Python cuNumeric, Legate (https://github.com/nv-legate/legate.core), and OpenAI Triton (https://openai.com/research/triton). This will extend to the current Nways Bootcamp with Python, which uses CuPy and Numba (available at https://github.com/openhackathons-org/nways_accelerated_programming/tree/main/_basic/python ). Use the existing RDF code as a starting code. The application must be profiled and assessed at each step, similar to Numba and CuPy versions. You are welcome to choose one of these.
Cupy:
Example 1 has a typo. Should be cuda.Device(0). Since we only have one MiG instance, using Device(1) will throw an error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.