dr-noob / peakperf Goto Github PK
View Code? Open in Web Editor NEWAchieve peak performance on x86 CPUs and NVIDIA GPUs
License: GNU General Public License v2.0
Achieve peak performance on x86 CPUs and NVIDIA GPUs
License: GNU General Public License v2.0
Hi,
I got the following error:
Unknown microarchitecture detected: M=0x00000007 EM=0x0000000B F=0x00000006 EF=0x00000000 S=0x00000001
The cpu is a 13th gen Intel I7-13700K.
If you need anything else let me know.
Dennis
Microarch: Knights Landing
Benchmark: Zen (AVX2)
Hey Dr-noob, 2 years ago you posted something about hacking pacybits, i need help with it and i'm wondering jf you could help me. Can i contact you somewhere?
I know your post was old but it would be really appreciated if you could help me.
[ERROR]: Found invalid uarch: 'Zen 3'
[ERROR]: peakperf is unable to automatically select the benchmark for your CPU. Please, select the benchmark manually (see peakperf -h) and/or post this error message in https://github.com/Dr-Noob/peakperf/issues
π
I've tried running FLOPS in Windows:
First, one have to change some int and long to stdint's type (int32_t
and int64_t
). After that, I tried running it and the performance was horrible. Then, I figured out, looking at assembly, that the loop was compiled poorly. I noticed I was using a 32 bit compiler (mingw
which is 32 bits). Lastly, I tried compiling it with a 64 bit compiler (I used mingw-w64
), using the Windows build (MingW-W64-builds
) and the Arch Linux one (mingw-w64-gcc-bin
). Both gave me the same result: segmentation fault. I found somewhere that running it with 32 bits but having a segfault with 64 bits could be caused to some issues with the stack, which should be solved using -fno-stack-protector
. This does not solve the segfault. I love you, Windows β€οΈ
Benchmarks capable of run in the current CPU may be highlighted in green, while not supported ones may be highlited in red
Hello!
I noticed the following during build:
./build.sh
...
-- The CXX compiler identification is GNU 13.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /opt/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 12.2.140
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- ----------------------
-- peakperf build report:
-- CPU mode: ON
-- GPU mode: ON
-- ----------------------
-- Configuring done (7.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/william/src/peakperf/build
[ 5%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/cpufetch/cpufetch.cpp.o
[ 5%] Building CXX object CMakeFiles/512_8.dir/src/cpu/arch/512_8.cpp.o
[ 11%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/cpufetch/cpuid.cpp.o
[ 11%] Building CXX object CMakeFiles/512_12.dir/src/cpu/arch/512_12.cpp.o
[ 14%] Building CUDA object CMakeFiles/gpu_device.dir/src/gpu/arch.cu.o
[ 20%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/cpufetch/uarch.cpp.o
[ 23%] Building CXX object CMakeFiles/256_5.dir/src/cpu/arch/256_5.cpp.o
[ 29%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch.cpp.o
[ 29%] Building CXX object CMakeFiles/256_6.dir/src/cpu/arch/256_6.cpp.o
[ 35%] Building CXX object CMakeFiles/256_6_nofma.dir/src/cpu/arch/256_6_nofma.cpp.o
[ 35%] Building CUDA object CMakeFiles/gpu_device.dir/src/gpu/kernel.cu.o
[ 35%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch_sse.cpp.o
[ 38%] Building CXX object CMakeFiles/128_6.dir/src/cpu/arch/128_6.cpp.o
[ 44%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch_avx512.cpp.o
[ 47%] Building CXX object CMakeFiles/cpu_device.dir/src/cpu/arch/arch_avx.cpp.o
[ 47%] Building CXX object CMakeFiles/256_8.dir/src/cpu/arch/256_8.cpp.o
[ 52%] Building CXX object CMakeFiles/128_8.dir/src/cpu/arch/128_8.cpp.o
[ 52%] Building CXX object CMakeFiles/256_10.dir/src/cpu/arch/256_10.cpp.o
nvcc fatal : Unsupported gpu architecture 'compute_35'
make[2]: *** [CMakeFiles/gpu_device.dir/build.make:76: CMakeFiles/gpu_device.dir/src/gpu/arch.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
nvcc fatal : Unsupported gpu architecture 'compute_35'
make[2]: *** [CMakeFiles/gpu_device.dir/build.make:90: CMakeFiles/gpu_device.dir/src/gpu/kernel.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:401: CMakeFiles/gpu_device.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 55%] Linking CXX static library lib512_12.a
[ 58%] Linking CXX static library lib512_8.a
[ 58%] Built target 512_12
[ 58%] Built target 512_8
[ 61%] Linking CXX static library lib256_5.a
[ 64%] Linking CXX static library lib256_8.a
[ 67%] Linking CXX static library lib256_10.a
[ 70%] Linking CXX static library lib256_6.a
[ 73%] Linking CXX static library lib128_8.a
[ 76%] Linking CXX static library lib128_6.a
[ 76%] Built target 256_5
[ 79%] Built target 256_8
[ 79%] Built target 256_10
[ 79%] Linking CXX static library lib256_6_nofma.a
[ 79%] Built target 256_6
[ 79%] Built target 128_8
[ 82%] Linking CXX static library libcpu_device.a
[ 82%] Built target 128_6
[ 82%] Built target 256_6_nofma
[ 82%] Built target cpu_device
make: *** [Makefile:136: all] Error 2
The relevant part being nvcc fatal : Unsupported gpu architecture 'compute_35'
I looked at the code briefly, but couldn't see anything obvious that would cause it to return as compute_35
.
The getGencode
script is just a validation tool I cobbled together for another project https://github.com/wallentx/alpha-report/blob/90ee2e7c006dcfd75dd76fe31ffc0a866179d819/get-gencode#L1-L33
If compiler is old, it may not support new -march flags, like zenver2
Run peakperf in CPU and GPU at the same time:
device == DEVICE_TYPE_HYBRID
NΒΊ Time(s) TFLOP/s (CPU + GPU)
1 2.50984 4.300 (500 + 3800)
2 2.50898 4.310 (500 + 3810)
Same as tensor cores, but with RT cores. Not sure if this RT cores will provide more performance than tensor cores, tough.
The table in https://github.com/Dr-Noob/peakperf#62-gpu also needs to be updated with proper information.
Because even tough a CPU may support a instruction set, may not support instructions generated by certain -march
flags
When I run the benchmark with >8 threads, my average performance is consistently lower than the expected GFLOPS. Running peakperf with no arguments yields GFLOP of 2048 but average performance under 1200 (I have decent undervolt enabled). Specifying 8 threads changes the printed GFLOP value to 1024, and I'm seeing an average perforamance > 1100 (but less than the results for t = 16). I'm not sure if this is an issue with my cpu configuration or the test expectations.
Something that makes the user understand that computing is taking place.
For example, a progress bar, like:
|βββββββββββββββ | 50%
|ββββββββββββββββββββββββββββββββ| 100%
or a blinking cursor like this function does:
spin() {
local -a marks=( '/' '-' '\' '|' )
while [[ 1 ]]; do
printf '%s\r' "${marks[i++ % ${#marks[@]}]}"
sleep 0.1
done
}
This would require adding an additional thread in main
There are many uarchs (e.g., Kaby Lake) that support AVX in the majority of CPUs but not all (e.g., celeron), but peakperf currently assumes that they all support AVX.
Commit c763c17 separated files by uarch, instead of the old system of separating backends by latencies, ALU numbers and width. I think I introduced that change because I thought I had to use specific -march
flags for each architecture, but it is not needed. Going back to the old system would reduce lines of code and code complexity.
Detect CPU microarchitecture (using cpufetch code) and select the right benchmark automatically
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.